As cyberdefenders in a war where every battle can mean winner takes all, enterprise IT departments must level the AI playing field.
Emphasis on “somebody.” There was always a “somebody.” A hacker. A human.
Today, the majority of network traffic isn’t human; it’s machine. There is no longer, necessarily, a “somebody”—at least, not directly.
Our current threat landscape of advanced persistent threats (APTs) makes this yet more troublesome. Hackers of old—human hackers—typically operated like burglars. They would get in, smash and grab, and quickly try to get out before anyone knew they were even there. Today’s attackers are more sophisticated. They find a weakness in a network, get in and stay there as long as they can—often for months or even years. Modern attackers have a number of tricks up their sleeves to prolong the length of time before they are discovered, such as using deep learning to poison security’s training data—thereby tricking a security measure’s defensive machine-learning algorithms into having an incorrect concept of normal behavior.
The good news is that cybersecurity professionals are now talking about—and using—AI technologies. As the bad guys have ramped up their capabilities for automation or machine learning, so, too, have the good guys. With modern machine learning for cybersecurity, we can not only detect anomalous bot activity but also determine what kind of bot it is and what type of function it fundamentally performs. From there, we can figure out what that bot is probably up to.
The bad news is that all of this data analysis in and of itself creates more data, which overwhelms enterprise IT departments.
The 3D AI Arms Race
AI technologies such as automation and machine learning fundamentally rely on data. They are thereby impacted by the three dimensions of data:
- Volume (How much data is there?)
- Velocity (How quickly is the data created, collected and analyzed?)
- Variety (What type of data are we dealing with? How similar—or dissimilar—are the different data points to each other?)
Collectively, these are known among data scientists as the three V’s—and each V is to blame for the problems of the AI vs. AI arms race in cybersecurity.
Consider the billions of website visits the internet sees each month. Human IT workers can manually investigate just a few of the thousands upon thousands of alerts that may come across their screens each day—each from either a bot or a human, with his/her/its own motives. In other words, the volume, velocity and variety of data with which IT departments contend are all unfathomably high. Knowing this, it’s of little surprise that one recent study found that more than one-third of enterprise organizations outright ignore 50 percent or more of their security alerts.
Compare how the bad guys operate. For years, spammers and scammers have been using machine learning to evade automated filters—and have scaled up their efforts of online do-baddery through their own automation (sometimes carefully choreographed to add a touch of human realism). The right automated email or series of emails can result in a fraudulent financial transfer, a successful malware injection or the revelation of sensitive data. Meanwhile, malicious network attackers have their own automated bots doing the heavy lifting for them—bots that are sometimes powered by machine learning themselves, pitting AI against AI on the network.
In this, the bad guys have an innate advantage; they need only one successful attack to “win.” Conversely, for an IT team, every battle must be won. This makes scalability even more important to cybersecurity defenders than it is to attackers. And, until real AI comes to town, fighting automation with automation just doesn’t scale for IT. It is an unwinnable arms race.
AI Modest Proposal
So ban the bots.
All of them—short of whitelisting a few explicitly known and trusted APIs.
After all, as cybersecurity specialists, what is our real purpose? We fortify. We defend. We protect. And, if something goes wrong, we mitigate. That’s it. Identifying different types of automated network traffic is helpful to those goals, but merely incidental.
In our current threat landscape, the first and foremost question for analyzing web application network traffic should be: Is it human?
If it’s not human, and it’s not a bot I’ve already expressly whitelisted as being known and trusted, then it doesn’t have any business being on my network. And it’s gone.
I don’t care if it’s trying to buy concert tickets, scrape metadata or inject a script. I don’t need to wait and see what kind of activity the bot is going to try to do. I just need to identify it as an untrusted bot—and prevent its entry.
Instantly, this eliminates more than a third of all web application traffic to my network.
With only humans (and a handful of whitelisted bots) acting on my network, the defensive automation and machine learning measures I have in place will be that much more effective. Security alerts and other result sets will come in at both a volume and a velocity that a human can effectively manage. Without any non-whitelisted bot activity on my network, these datasets will inherently contain more context—thereby minimizing the variety element.
The upshot: My IT department is going to be a lot less overwhelmed by security alerts—and may actually start paying attention to more of them. As an added bonus, this reduced dimensionality will lead to more—and simpler—datasets, allowing my network’s machine learning capabilities to improve.
From there, someday, we can be ready for AI versus AI. We’re just not there yet. We need to shore up our defenses first.