Chapter 267: Accelerating Data Utilization Compliance(1/2)
Chapter 267 Accelerating Data Utilization Compliance
Being able to be so considerate of the emotions of his subordinates made Huang Jing feel more and more that following Lin Hui was the right choice.
Of course, Lin Hui didn't know Huang Jing's inner self-strategy.
Later, Lin Hui and Huang Jing did not talk about working online.
Instead, we chatted about some news from American technology giants.
Although it is basically boring news such as gossip, not all information is gossip.
At least Lin Hui didn't get nothing.
From the follow-up conversation with Huang Jing, Lin Hui learned a very important piece of information from Huang Jing.
That is, Apple seems to be committed to pursuing a large data transaction totaling approximately US$200 million to US$300 million.
Huang Jing was a bit vague when describing this news.
It seemed as if he was afraid of accidentally trapping Lin Hui.
The information described by Huang Jing in the past was often conclusive.
It is rare to feel unconfident.
When it comes to this transaction, Huang Jing first said it was a data transaction and later said it was not a data transaction.
This made Lin Hui a little confused.
Even the gossip Lin Hui attaches great importance to the corresponding value, after all, often there is no smoke without fire.
As for what the message Huang Jing said was, it was further inquired and verified by multiple parties.
After some further deliberation, Lin Hui finally figured it out.
The so-called data transaction of two to three billion US dollars does point to data, but it is not a general type of data transaction.
The data acquisition Apple is pursuing this time is indeed a rather special data transaction.
Because of the information obtained through various channels, Lin Hui felt that Pingcheng’s goal was actually:
——“Dark data”.
With this plan, it can be seen that Pingchun seems to be building a plank road to cross Chencang secretly.
Dark data is sometimes also called dust data.
Dark data or "dust data" is composed of all redundant, often forgotten data.
This data is collected by companies and organizations in the course of their activities but then not used.
Dark data is often unstructured, unlabeled, and unanalysed information.
Compared with the annotated data that Lin Hui ignored before.
Dark data has no sense of existence.
Dark data is almost ignored.
After all, this kind of data exists in the network and server, and it only takes up valuable space.
Generally speaking, there are three main types of dark data:
The first is traditional text-based data. This may include emails, logs, and documents.
The second type is non-traditional data.
This includes untagged audio and video files, still images and sound files.
The third type is depth data.
This includes information in the deep web that search engines cannot reach.
Most of these deep data are private and controlled by governments or private institutions.
It includes data, medical records, legal records, financial information and organization-specific databases curated by academics, government agencies and local communities.
All the above data can be called dark data.
…
Data such as dark data are more obscure than data in the traditional sense.
Although unlabeled data such as dark data cannot be used directly.
But the potential of this kind of thing cannot be denied.
Anyway, you can never say that this information is not important.
As for why Guozi is interested in this kind of thing.
Because collecting this type of data has never been considered data.
In fact, through in-depth cultivation, you can get similar effects to traditional data.
Moreover, by using this kind of data, through some conceptual education, consumers can even form the impression that the company never gets involved in general data.
Wouldn’t this be very useful in establishing a corporate image?
In short, it cannot be said that it is not attractive to companies that are both relevant and established.
Anyway, Lin Hui feels that starting with dark data is in line with the behavior of many technology giants.
Compare it to Lin Hui’s previous estimated price.
If you say tens of millions of dollars, you can buy tens of millions of bilingual annotated data.
It is conceivable that dark data worth two to three billion US dollars, like what Apple is seeking, must be quite a huge amount of data.
A major difference between annotated data and dark data is that annotated data is structured data that has undergone certain processing.
To a large extent, dark data is data that has not been structured and is even very "messy".
Structured data is generally data that has a fixed format and limited length.
For example, a filled-in form is structured data.
For example, "Nationality, flower grower, ethnicity: Han, gender: male, name: Zhang San, age:..."
This kind of CCTV is called structured data.
This type of data is easily stored in a database in a fixed format.
Semi-structured data is worth some data in XML or HTML format.
This type of data can be processed as structured data as needed, or plain text can be extracted and processed as unstructured data.
So-called unstructured data: data with variable length and no fixed format.
For example, web pages and emails are sometimes very long; sometimes they are very short and disappear in a few sentences. This type of data is typical unstructured data.
For example, Word documents, voices, videos, and pictures are all unstructured data.
Semi-structured data and unstructured data are generally combined into one and are collectively referred to as "dark data".
This term is not defined by Lin Hui.
Compared with structured data such as annotated data, the value of dark data and annotated data is not the same.
The value of unit labeled data is often dozens or even hundreds of times that of unit dark data.
Even if two to three billion US dollars are exchanged for more expensive cross-lingual language annotation data, it can be exchanged for hundreds of millions of pieces.
Not to mention spending hundreds of millions of dollars in exchange for dark data?
It is conceivable that the dark data involved in the two to three billion US dollars is a considerable amount of dark data.
Lin Huina has a lot of information about his past life.
But there is absolutely no dark data that can satisfy Apple’s appetite.
Not to mention the bit of information about Lin Hui’s previous life.
Even the scale of dark data owned by some domestic Internet companies that are powerful and powerful among domestic Internet giants may not be able to satisfy Apple's appetite.
In this case, if Lin Hui is interested in Pingcheng's huge acquisition, it seems that he can only collect dark data.
As for how to collect it?
Dark data is collected in a variety of ways.
Because dark data includes user activity logs, customer conversation or email records, server monitoring logs, video files, machine and sensor information generated by the Internet of Things.
Dark data may also include data that is no longer accessible because it is stored on outdated devices.
In this case, it is often possible to obtain some dark data when cleaning activity logs or collecting storage fragments.
In addition, there are many ways to collect dark data.
Although it is easy to say.
But as the saying goes, talking about toxicity regardless of dose is just a hoax.
For the same reason, talking about mining data regardless of the size of the data is also a hooliganism.
To be continued...