Why Training Matters – And How Adversarial AI Takes Advantage of It

by Christian Wiens on June 18, 2020

The following is an excerpt from our recently published whitepaper, “Self-Supervised Learning – AI for Complex Network Security.” The author, Dr. Peter Stephenson, is a cybersecurity and digital forensics expert having practiced in the security, forensics and digital investigation fields for over 55 years.

Section 4 – Why Training Matters – And How The Adversary Takes Advantage of It

There are over 1,200 peer-reviewed papers written on the subject of adversarial AI going back to 2014. That means that there are many tested (mathematically and in the lab, usually) attacks that can, potentially, succeed against an AI system. In this section we’ll describe a few and show how self- supervised training is less susceptible to them.

The type of attack that can succeed with supervised trained ML systems without the adversary knowing anything about the ML system is called a black box attack. A black box attack tests the AI system with probes to determine how it will respond to various types of attacks. This type of attack is called an oracle attack.

The black box adversary collects data points by querying the “oracle” and builds a duplicate model based upon returns from the training set that is in actual use on the target. The objective of the legitimate training set is to observe data points and create a model that classifies each data point. In a security system the classifications may be simple: benign or malicious, the malicious classification presumably representing potential threats.

The adversary then duplicates the legitimate training set and slightly alters one or more of the data points so that an event that appears to be benign – and is classified as such – actually is not. These slight perturbations hark back to our examples of early AV tools. This is similar to slightly altering the signature of the virus without materially altering the virus. The AV misses it but the host is infected anyway. Knowing how the ML will respond, the adversary formulates her attacks accordingly.

In a supervised learning model this type of attack is feasible because the training set is static. However, that is not the case with a self-supervised training model. In a self-supervised training model there is no static training set. The ML system learns from its environment. Therefore a black box query of the oracle will be unsuccessful because the system constantly is learning and there can be no misclassification.

Rules for the ML system to take its initial steps come in part from the algorithms that comprise its programming. We might think of these rules as, loosely, policies. Their purpose is to start the ML system on its learning path.

However, most of the Third Wave system’s initial training actually comes from its observation of its environment. In simple terms, the ML system learns from observation how the host is supposed to behave and sets that as a baseline for its training.