Allegory of the ‘Synthetic Data Cave’
Can synthetic data solve the issue of fake news and discern truth from lies in the digital world?
What lessons can data engineers and consumers of information in today’s digital world take from Plato’s “Allegory of the Cave“? Actually, quite a few, when it comes to synthetic data.
Facebook recently announced it will use synthetic data to not only identify fake news but also reduce online harassment and prevent users from falling victim to political propaganda, as was the case during the 2016 election.
For those unfamiliar with the allegory, it starts with Plato’s two characters, Socrates and Glaucin, exchanging ideas about a hypothetical scene in which men are held captive and chained to the wall of a proverbial cave.
Because they are shackled, they sit staring straight ahead, unable to turn their heads; thus, all they see is whatever images are cast upon the wall in front of their faces. Shadows of things are projected before them. As a result, the only reality the prisoners know is the mere forms of objects, not the objects themselves.
The age-old allegory is not unlike the world in which we live now, with people sitting before a screen unable to distinguish truth from alternative facts and outright lies. At one point in his exchange with Glaucin, Socrates wonders what would happen if the prisoners were set free.
Will They See the Truth?
Synthetic data, or computer-generated data that mimics real data, has the potential to be the “light” that shines when a prisoner escapes the cave and steps into the daylight.
At a time when fake news spreads faster than truth, Facebook’s announcement that it will use synthetic data to combat fake news provides some hope that consumers of digital information will be freed from their shackles, exit the cave and see things for what they really are.
According to Sergey Nikolenko, Neuromation’s chief research officer, Facebook is already using synthetic data to recognize online insults. “If Facebook succeeds in creating fake texts it will be a huge accomplishment. They will need to use fake, fake news to train the models, and this will be a huge breakthrough,” he said.
How Can Fake Data Reveal the Truth?
Creating different sets of models is one application of synthetic data, but fake data also can be used for a watermarking effect, said Derek Abdine, head of lab, Rapid7. “By taking the fake stories and broadcasting them to a wider group, you can look at the chain that broadcasts those stories and understand the types of actors, who they are and what their message is.”
By way of example, Abdine compared the process to that of a CT scan, in which a patient is injected with a chemical that is used as a tracer. The liquid allows the scan to see more visibly inside the body. Injected inside fake news is a similar tracing mechanism that is dispersed into the public internet.
“The data engineers release information, it’s picked up by someone who thought it beneficial to their point of view. They can then more actively correlate the traffic seeing from those entities and then remove the stories from the feed,” Abdine said.
Despite its being computer-generated, synthetic data is a viable solution to the fake news problem: It allows engineers to attribute the sources and establish whether trustworthy sources exist.
Combining the tools of artificial intelligence with data science and deep learning mechanisms helps to identify patterns that give a better sense of the trustworthiness of sources. When deep learning is applied, “it helps to understand and correlate the original story and how it is being used to push an agenda,” said Abdine.
With any model data engineers try to create, they first have to establish a baseline by determining the patterns of truth and then identifying the outliers to that data. Effectively combating the problem of fake news demands monitoring so engineers can gain an understanding of the methods and the tools that adversaries are using.
“It’s a lot of analyzing interactions with fake information to understand toolsets to better discover how they are pulling off spreading the fake information,” said Abdine. “There are tools that may be able to broadcast to multiple sources and have unique fingerprints.”
Will the Truth Matter?
At the end of the allegory, Socrates poses a question that transcends time and generation: What would the liberated prisoner now prefer? Certain that most people would choose to remain shackled, Plato might ask whether the minds of Facebook users actually want to be freed.
Should the answer to that question be “No”? Synthetic data has many other applications across all sectors from health care to politics and cybersecurity. According to MIT News: “Artificial data is also a valuable tool for educating students—although real data is often too sensitive for them to work with, synthetic data can be effectively used in its place. This innovation can allow the next generation of data scientists to enjoy all the benefits of big data, without any of the liabilities.”