Challenges faced while training an AI to combat abuse

This post looks at the four main challenges that arise when training a classifier to combat fraud and abuse.

This is the second post of a series of four that is dedicated to provide a concise overview of how to harness AI to build robust anti-abuse protections.
The first post
explains why AI is key to build robust anti-defenses that keep up with user expectations and increasingly sophisticated attackers. Following the natural progression of building and launching an AI-based defense system, the third post will examine classification issues, and the last post will look at how attackers go about attacking AI-based defenses.

This series of posts is modeled after
the talk I gave at RSA 2018
. Here is a re-recording of this talk:

You can also get the
slides here

Disclaimer: This series is meant to provide an overview for everyone interested in the subject of harnessing AI for anti-abuse defense, and it is a potential blueprint for those who are making the jump. Accordingly, this series focuses on providing a clear high-level summary, purposely avoiding delving into technical details. That being said, if you are an expert, I am sure you will find ideas and techniques that you haven’t heard about before, and hopefully you will be inspired to explore them further.

At a high level, training a classifier to detect fraud and abuse unique is that it deals with adversarial data. AFAIK no other AI field has to deal with adversaries that are actively trying to undermine your training.

The first two challenges look at the main consequences of having an adversarial:
1) abuse-fighting is a
problem, and 2) it is hard to collect accurate training data. The third challenge relates to dealing with data and taxonomy ambiguity. Last, but not least, the last challenge is how to apply AI to products that are not AI friendly because they don’t have rich content and features but still need to be protected.

Let’s get started!

Non-stationary problem

Traditionally, when applying AI to a given problem, you are able to reuse the same data over and over again because the problem definition is stable. This is not the case when combating abuse because attacks never stop evolving. As a result to ensure that anti-abuse classifiers remain accurate, their training data need to be constantly refreshed to incorporate the latest type of attacks.

Let me give you a concrete example so it is clear what the difference between a stable problem and an
unstable/non-stationary one

cat example

Let’s say you would like to create a classifier that recognizes cats and other animals. This is considered to be a stable problem because animals are expected to look roughly the same for the next few hundred years (barring a nuclear war). Accordingly, to train this type of classifier, you only need to collect and annotate animal images once at the beginning of the project.

Phishing through ages

On the other hand, if you would like to train a classifier that recognizes phishing pages, this “collect once” approach doesn’t work because phishing pages keep evolving and look drastically different over time, as visible in the screenshot above.

More generally, while training classifiers to combat abuse, the first key challenge is that:

Past training examples become obsolete as attacks evolve

While there are no silver bullet to deal with this obsolescence, here are three complementary strategies that helps coping with ever-changing data:

  1. Automate model retraining: You need to automate model retraining on fresh data so your model keeps up with the evolution of attacks. When you automate model retraining, it is a good practice to have a
    validation set
    that ensures the new model performs correctly and doesn’t introduce regressions. It is also useful to add
    hyperparameter optimization
    to your retraining process to maximize your model accuracy.
  2. Build highly generalizable models: Your models have to be designed in a way that ensures they can generalize enough to detect new attacks. While ensuring that a model
    generalizes well
    is complex making sure you model have enough (but not too much) capacity (i.e., enough neurons) and quite a lot of training data is a good starting point.

    Impact of data augmentation

    If you don’t have enough real attack examples, you can supplement your training data with
    data augmentation techniques
    that increase the size of your corpus by generating slight variation of your attack examples. As visible in the table above, taken from
    this paper
    , data augmentation make models more robust and do increase accuracy significantly.

    Learning rates

    Finally you should consider other finer, well-documented technical aspects, such as
    tuning the learning rate
    using dropout

  3. Set up monitoring and in-depth defense: Finally, you have to assume your model will be bypassed at some point, so you need to build defense in depth to mitigate this issue. You also need to set up monitoring that will alert you when this occurs. Monitoring for a drop in the number of detected attacks or a spike in user reports is a good starting point.

Gmail malicious attacks

Quite often, I get asked how quickly attacks are evolving in practice. While I don’t have a general answer, here is a key statistic that I hope will convince you that attackers indeed mutate their attack incredibly quickly: 97 percent of Gmail malicious attachments blocked today are different from the ones blocked yesterday.

Fortunately those new malicious attachments are variations of recent attacks and therefore can be blocked by systems that generalize well and are trained regularly.

Lack of ground truth data

Dog and Cat

For most classification tasks, collecting training data is fairly easy because you can leverage human expertise. For example, if you want to build an animal classifier, you could ask people to take a picture of animals and tell you which animals are in it.

Play Store reviews

On the other hand, collecting ground truth (training data) for anti-abuse purposes is not that easy because bad actors try very hard to impersonate real users. As a result, it is very hard even for humans to tease apart what is real and what is fake. For instance, the screenshot above showcases two Play store reviews. Would you be able to tell me which one is real and which one is fake?

Obviously telling them apart is impossible because they are both well written and over the top. This struggle to collect abusive content accurately exists all across the board whether it is for reviews, comments, fake accounts or network attacks. By the way, both reviews are real in case you were wondering.☺️

Accordingly, the second challenge on the quest to train a successful classifier is that:

Abusers try to hide their activities, which makes it hard to collect ground truth data

While no definitive answers exist on how to overcome this challenge, here are three techniques to collect ground truth data that can help alleviate the issue:

  1. Applying clustering methods: First, you can leverage
    clustering methods
    expand upon known abusive content
    to find more of it. It is often hard to find the right balance while doing so because if you are clustering too much, you end up flagging good content as bad, and if you don’t cluster enough, you won’t collect enough data.

  2. Collecting ground truth with honeypots:
    )-controlled settings ensure you that they will only collect attacks. The main difficulty with honeypots is to make sure that the collected data is representative of the set of the attacks experienced by production systems. Overall, honeypots are very valuable, but it takes a significant investment to get them to collect meaningful attacks.

    Examples of Data augmentation using GAN

  3. Leverage generative adversarial networks: A new and promising direction is to leverage the recent advance in machine learning and use a
    Generative Adversarial Network
    (main paper), better known as GAN, to
    reliably increase your training dataset
    . The screenshot above, taken from this paper, show you an example of face generation using it: only the top left image is real. While still very experimental, here is one of the last paper on the topic, this approach is exciting as it paves the way to generate meaningful attack variations at scale.

Ambiguous data & taxonomy

The third challenge that arises when building a classifier is that what we consider bad is often ill defined, and there are a lot of borderline cases where even humans struggle to make a decision.

Context matters

For example, the sentence “I am going to kill you” can either be viewed as the sign of a healthy competition if you are playing a video game with your buddies or it can be a threat if it is used in a serious argument. More generally, it is important to realize that:

Unwanted content is inherently context, culture and settings dependent

Accordingly, is it impossible, except for very specific use cases such as profanity or gibberish detection, to build universal classifiers that will work across all products and for all users.

Spam foldering

When you think about it, even the well-established concept of SPAM is ill defined and means different things for different people. For example, countless Gmail users decide that the emails coming from a mailing list they willingly subscribed to a long time ago are now spam because they lost interest in the topic.

Here are three way to help your classifier deal with ambiguity:

  1. Model context, culture and settings: Easier said than done! Add features that represent the context in which the classification is performed. This will ensure that the classifier is able to reach a different decision when the same data is used in different settings.

  2. Use personalized models: Your models need to be architectured in a way that takes into account user interests and levels of tolerance. This can be done by adding some features (pioneer paper) that model user behavior.

  3. Offer users additional meaningful choices: You can reduce ambiguity by providing users with alternative choices that are more meaningful than a generic reporting mechanism. Those more precise choices reduce ambiguity by reducing the number of use cases that are clamped behind a single ill-defined concept, such as spam.

Gmail blocking option

Here is a concrete example of how the addition of meaningful choices reduces ambiguity. Back in 2015, Gmail started offering its users the ability to easily unsubscribe from mailing lists and block senders, giving them more control over their inboxes. Under the hood, this new options helps the classifiers as they reduce the ambiguity of what is marked as spam.

Lack of obvious features

Our fourth and last training challenge is that some products lack obvious features. Until now, we have focused on classifying rich content such as text, binary and image, but not every product has such rich content.

Youtube views

For example, Youtube has to be defended against fake views, and not a lot of obvious features that can be leveraged to do so. Looking at the view count timeline for the famous Gangnam style video, you will notice two anomalous peaks. These might be from spammers or simply because the video had huge spikes due to virality. It is impossible to tell by just looking at how the view count grew over time.

In general, AI thrives on feature-rich problems such as text or image classification; however, abuse fighters have to make AI work across the board to protect all users and products. This need to cover the entire attack surface led us to use AI to tackle use-cases that are less and ideal, and sooner or later we have to face a hard truth:

Some products in need of protection don’t have the rich features AI thrives on

Fortunately, you can (partially) work around the lack of rich features. In a nutshell, the way to build an accurate classifier when you don’t have enough content features is to leverage auxiliary data as much as possible. Here are three key sources of auxiliary data you can potentially use:

  1. Context: Everything related to the client software or network can be used, including the user agent, the client IP address and the screen resolution.

  2. Temporal behavior: Instead of looking at an event in isolation, you can model the sequence of actions that is generated by each user. You can also look at the sequence of actions that target a specific artifact, such as a given video. Those temporal sequences provide a rich set of statistical features.

  3. Anomaly detection: It is impossible for an attacker to fully behave like a normal user, so anomaly features can almost always be used to boost detection accuracy.

The last point is not as obvious as it seems so let’s deep dive into it.


At its core, what separates rudimentary attackers from advanced ones is their ability to accurately impersonate legitimate user behavior. However, because attackers aim at gaming the system, there always will be some behaviors that they can’t spoof.

It is those unspoofable behaviors that we aim at detecting using
one-class classification
. Introduced
circa 1996
, the idea behind one-class classification is to use AI to find all the entities belonging to a single class (the normal behavior in our case) out of all entities that exist in a dataset. Every entity that is not member of that class is then considered an outlier.

One class classifier

For abuse purposes, one-class classification allows to detect anomaly/potential attacks even when you have no attack examples. For example, the figure above shows in red a set of malicious IPs attacking Google products that were detected using this type of approach.

Overall, one-class classification is a great complement to more traditional AI systems as its requirements are fundamentally different. As mentioned earlier, you can even take this one step further and feed the result of your one-class classifier to a standard one (binary class) to boost its accuracy.

This wraps up our deep dive into the challenges faced while training an anti-abuse classifier. The next post covers the challenges that arise when you start running your classifier in production.

Thank you for reading this post till the end! If you enjoyed it, don’t forget to share it on your favorite social network so that your friends and colleagues can enjoy it too and learn about AI and anti-abuse.

To get notified when my next post is online, follow me on
, or
. You can also get the full posts directly in your inbox by subscribing to the mailing list or via

A bientôt!

*** This is a Security Bloggers Network syndicated blog from Elie on Internet Security and Performance authored by Elie Bursztein. Read the original post at: