Why Big Data Missed Early Warning Signs of COVID-19

Back in September 2014 there was an excellent article on FP called “Why Big Data Missed Early Warning Signs of Ebola

It’s an inspirational story that is a common refrain in the big data world — sophisticated computer algorithms sift through millions of data points and divine hidden patterns indicating a previously unrecognized outbreak that was then used to alert unsuspecting health authorities and government officials. The problem is that this story isn’t quite true: By the time HealthMap monitored its very first report, the Guinean government had actually already announced the outbreak and notified the WHO.

They go on to say the problem was not any lack of social commentary, which legitimately came early and wasn’t noticed by big data systems. The problem is official channels of news were being downplayed in order for “artificial intelligence” (AI) to take credit for just reading those same official channels of news.

Thus, contrary to the narrative that data mining led to an intelligence coup, HealthMap’s earliest signals on March 14 were actually simply detections of this official government announcement in French. Despite all of the attention and hype paid to social media as a sensor network over human society, mainstream media still plays a critical role as an information stream in many areas of the world. This is not to say that there were not far earlier signals manifested in the myriad social conversations among medical workers and citizens in the region, only that it was not these indicators that HealthMap — or anyone else — detected.

My presentations in 2014 and after would often cite this example as a failure of big data. More recently at the 2019 RSA Conference for example, I presented Ebola warnings as one of the top ten security disasters of ML.

I also used to give examples where insurance companies ran big data systems in the cloud for modeling pandemics as well as chemical weapons spreading in the US. Perhaps I will dig up some of those old 2014 slides and post here again.

One true “in the trenches of big data technology” experience I sometimes like to relate to people was how a very large insurance company got a call from Amazon demanding some kind of formal advance notice before cloud services were lit up for pandemic simulations. There was a time when the whole of Amazon’s cloud simply couldn’t handle the loads of powerful and real pandemic prediction algorithms (run in complete secrecy).

A lot has changed since then, although some things have not.

On the plus side a pandemic-prediction technology company founded during the Ebola crisis has this time around claimed success in the early warning race:

…December 30, 2019, BlueDot, a Toronto-based startup that uses a platform built around artificial intelligence, machine learning and big data to track and predict the outbreak and spread of infectious diseases, alerted its private sector and government clients about a cluster of “unusual pneumonia” cases happening around a market in Wuhan, China. That was the first recognition of the novel coronavirus that has come to be known as COVID-19.

Before looking at this tall claim more carefully, note the list of “first places” in the same story:

In the case of COVID-19, the system flagged articles in Chinese that reported 27 pneumonia cases associated with a market that had seafood and live animals in Wuhan. In addition to the alert, BlueDot correctly identified the cities that were highly connected to Wuhan using things like global airline ticketing data to help anticipate where the infected might be traveling. The international destinations that BlueDot anticipated would have the highest volume of travelers from Wuhan were: Bangkok, Hong Kong, Tokyo, Taipei, Phuket, Seoul, and Singapore. In the end, 11 of the cities at the top of their list were the first places to see COVID-19 cases.

Here they are again: Bangkok, Hong Kong, Tokyo, Taipei, Phuket, Seoul, and Singapore.

Now look at the lines on this Tomas Pueyo graph of infection rates from his post called “Act Today or People Will Die“.

Source: Tomas Pueyo

If you squint you may be able to see the the 11 cities listed by Blue Dot are near to flat on the bottom, unless they’re not on the chart at all because too few cases exist. Countries like South Korea, US and France are rocketing upwards. As the author explains without mincing words, there’s an obvious causation for the difference in rates:

South Korea cases have exploded, but have you wondered why Japan, Taiwan, Singapore, Thailand or Hong Kong haven’t? All of them were hit by SARS in 2003, and all of them learned from it.

SARS had a huge impact and countries that setup national pandemic command centers to prepare for the next time are showing the benefits. Their use of big data has been to enhance preparedness by enabling testing and containment routines, best exemplified by the Singapore public dashboard.

Meanwhile in America, the lessons from the spread of a deadly virus seem to have been mostly ignored or reversed by the current administration, leading the country towards a repeat of tragic history.

For five long years under Ronald Reagan’s Presidency there had been no statement, no policy or response. There was open ignorance and dismissal from 1982 to 1987 that a deadly virus even was a concern. Reagan literally laughed in press conferences asking about citizens dying from a virus as fatalities climbed dramatically. The President refused to let the national Center for Disease Control (CDC) communicate or be transparent about how to stop the spread.

Similarly, in the the current anti-science White House, a CDC response center was closed and communication shut down about viruses despite intelligence offices formally predicting a coming pandemic. In fact, the director of offices warning a virus would be a real national security concern was instead fired for fairly open political reasons.

The leadership “team have been dishonest about the coronavirus” spreading lies and sowing confusion just like Reagan did in the 1980s with AIDS, on top of enabling healthcare market fraud that gives no coverage for scientific testing of coronavirus in America.

The lesson from the AIDS virus in America therefore does not seem to have been fund a national command center for immediate test and containment, leveraging the latest and greatest big data technology, but instead… that a President can tell lies, play golf and refuse to lift a finger as tens of thousands of Americans needlessly are dying on his watch.

Blue Dot is notably Canadian.

So let’s go back to details of that Blue Dot announcement for a minute. FP complained in 2014 that AI really meant just reading regular news channels and trying to take credit for it as novel. The core to their message above is this:

…mainstream media still plays a critical role as an information stream in many areas of the world. This is not to say that there were not far earlier signals manifested in the myriad social conversations among medical workers and citizens in the region, only that it was not these indicators that HealthMap — or anyone else — detected…

That is quite literally what happened in China again this time. Another Tomas Pueyo graph lays it out by day to clearly show a timeline of social conversations and then news stories. Click to enlarge.

Source: Tomas Pueyo

The text boxes are basically this:

  • Dec 26: 4 unusual pneumonia cases noticed in HICWM Hospital by a Dr. Zhang
  • Dec 27: Dr. Zhang reports cases to government CDC
  • Dec 28-29: 3 more pneumonia cases in HICWM
  • Dec 30: //not on Pueyo timeline// Wuhan Central Hospital’s emergency department director (Ai Fen) uploads diagnostic record to WeChat
  • Dec 30: government begins formal investigation in Wuhan City
  • Dec 31: Wuhan health officials formally release news to China’s national health officials including their CDC and the global WHO

And on Dec 30th Blue Dot claimed credit for being the first to notice. This is quite exactly what FP was complaining about in 2014, when a machine reads the news and says it was early while being on the same timeline as an existing human global notification system.

To be fair, Blue Dot was right there on the clock and neither claims AI to be a cure-all, nor that they were doing something amazing other than reading the news others were publishing. As they put it in their PR they “flagged articles in Chinese that reported 27 pneumonia cases associated with a market”.

While the system worked as designed, it still gets classified as a failure under the 2014 definition of high expectations for phrases like big data or AI. Local news and social channels reported the outbreak of pneumonia with SARS-like potential. Then people or machines both read that and flagged it as early warning signs of another SARS-like incident.

Reading newspapers around the world and reporting them on the same day was hot new technology of 1920. Hard to call this really newsworthy itself in 2020. As I said before, a lot has changed, while some has not. I wish Blue Dot didn’t call their warnings early, and instead called them inexpensive or less complicated.

Nonetheless, if we allow the bar to be lowered to allow heavily funded startups to succeed and be measured for easier finish lines, Blue Dot did indeed do what they advertised by reading news about SARS-like pneumonia as it was published and then repeating it for others to also read.

I’m not just pointing out a lowered bar has risk because I want to be captain obvious who says be wary of PR from startups. I actually believe we should hold the bar higher for them. There are technical solutions that really could give early warning signs that are ahead of the local reporters themselves, perhaps even before social conversations reach the reporters.

That is both why I’ve been writing my new book, and also is the focus of software I’m working on now. We can do better with big data technology, and we will.

*** This is a Security Bloggers Network syndicated blog from flyingpenguin authored by Davi Ottenheimer. Read the original post at: