How to Use Your Baseline for Network Security

by Ana Mezic on December 3, 2019

This is the final article in a three-part series on Network Baselining. Read the first two articles in the series here:
1) An Introduction to Baselining Technology
2) How to Create a Baseline for Your Network

Once the Baseline is Recorded How Does the AI Detect if There is an Anomaly on it?

What the AI technically does is measure the distance between what should have happened and what is actually happening on a network in order to determine if there are anomalies and prioritize them.

Let’s say I want to understand whether there is a dangerous situation for me on the street and need to know my distance from a car and the velocity of the car, or my own velocity and the velocity of the car. The idea of a distance to something is critical to how we do our analysis of anomalies.

What our algorithm really does is compute what the distribution of various events should be, like how many emails and how many files should have been transferred in the last five minutes, and what ended up happening instead.

We look at what was predicted versus how many were really transferred, and subtract those two numbers, and we do this with many variables. If you add up the absolute values of those numbers, that tells you the difference. If that difference is very big there’s a problem happening.

That might sound a bit like statistics, and it is, just not quite as simple.

The difference between your normal statistics and what were doing is that we started with a complete mess of data. We could have in principal just done statistics on it, and there are companies out there that do. But that method would totally miss all these deterministic points like people starting work at 9 a.m. and leaving at 6 p.m. We could only find out how many files are on average transferred in 5 minutes at any time, and not specifically how many files should be transferred at 3 p.m. on a Tuesday.

The AI will “flag” an anomaly, what happens after that?

At this point in time we immediately have the option to look at the IPs that were involved in whatever layer of the network that got flagged. Now that might be a large number of IP’s, so we go after the ones that had the most communication, or many files sent, and this is something that we rank the implicated IP’s are prioritized.

At the end of the day the customer will be able to get to the root cause, the individual IP that caused the anomaly.

The future of baselines? Where do we see this technology going?

The basic algorithm is going to stay pretty much locked because that’s the algorithm that enables us fast computation of the things we want. The kinds of things that we are interested in is moving towards is moving to the level of sub-networks. So rather than just inbound, outbound and local we could further specialize into departments like Sales or HR.

In 2020 MixMode will shift towards giving the user a lot more information about their data, effectively going from anomaly detection to anomaly prevention.

What do other companies offer in terms of baseline creation and monitoring and why is MixMode’s platform better?

There are companies out there that claim aspects of baselining, but baselining in its nature must involve Unsupervised Learning. There are no two ways about it, and there are very few companies out there that are able to say they do Unsupervised Learning and Baseline your network.

There are none that do a comprehensive collection and analysis of data off all the IPs on your network, and this has been confirmed by an independent IDC Analyst named Ritu Jyoti.

Someone asked very specifically if there is any company out there that does the same thing and she said no.

We’re now taking multiple streams of data, how does that fit into the equation?

The more feeds of data we have, the more context we have. The AI operates on its own, it doesn’t take any feeds in except the metadata, which operates off of time stamps and rules.

The page that we are creating right now will have a correlation of AI warnings with the various rules and feeds. This is helpful because then you have an intel feed that pops up saying something about a large outbound file transfer, and at the same time the AI is saying that this is abnormal, then you have correlated information that something fishy is going on. It might have been completely normal on the intel feed, but the AI also said completely independently that this is abnormal, so you should look.

This series was written in partnership with MixMode CTO & Chief Scientist, Dr. Igor Mezic. He has spent his career developing highly complex algorithms and artificial intelligence for data analytics . He graduated with a doctorate from CalTech, holds 5 patents, and is a professor of mechanical engineering at the University of California, Santa Barbara. The MixMode AI, which has been used in projects at DARPA and the DoD is the first commercial use of true third-wave AI.