SBN

Stopping Bots With Sliceline Machine Learning

This is the latest article of our series dedicated to sliceline. We previously explored what sliceline is and how to apply it to toy (fictional) datasetsIn this article, we cover:

  • Sliceline application in a security context.
  • How to configure sliceline to identify relevant slices.

Sliceline Applied to Bot Detection

Sliceline is a machine learning (ML) model debugger. The ML model we are going to debug is a bot detection model, designed as a simple supervised learning model that leverages a few signals to determine whether a request comes from a human or a bot. 

The bot detection model’s prediction errors will be analyzed using sliceline. The ML model was trained on one of our customers’ real traffic data. Let’s see what the dataset contains…

Training Data

Our training dataset is composed of 20,000 HTTP requests, with half considered by our detection system to be bot requests and the other half considered human requests. In addition to the label, we have 6 columns (aka “features” in ML) for each request:

  • Country (country)
  • Cookie Length (cookie_length)
  • Headers (headers)
  • Host (host)
  • Method (method)
  • User Agent (user_agent)

You can explore our blog if you want to learn more about the different fields/features. 

Note that our model considers the cookie_length as a numerical column. Also, to avoid leaking information about our customer, the host and the user_agent have been encrypted.

Data for a single hit considered human:

  • country: FR
  • cookie_length: 249
  • headers: 1277637147
  • host: f5740297184fb9d7c952c2c0172a8c8a71fdd1e0fffcab2b2c7a918c06a457be
  • method: GET
  • user_agent: db0ec61fd196ee3b8fc09a5459e6e7a3f7e6733139740504ce032ecf37595d1a

Modeling

To predict if a hit is coming from a bot or a human, we first designed a processing pipeline using scikit-learn:

Column Transformer SlicelineProcessing pipeline to prepare our training features to be ingested by our model.
Empty values are imputed using:
– A KNN Imputer for numeric features (with 5 neighbors).
– A Simple Imputer for categorical features (with the “constant” strategy).

To complete our pipeline, we trained a CatBoostClassifier with standard parameters on the processed data.

Our pipeline estimated the score of each request to be a bot request, allowing us to compute the element-wise log loss as our model training error. The lower the log loss/model training error, the better the model.

Handling Numeric Columns

Per our previous article:

By feeding sliceline with the training dataset and the element-wise training errors, we are able to identify where the model is significantly underperforming.

Sliceline will identify subpopulations where the model is struggling. Subpopulations, also called “slices”, are defined by a filter of the form:


field_1=value_1 & field_2=value_2 & field_3=value_3 … & field_N=value_N (1)

Where N > 0.

Numeric features are practical for the ML model. But for sliceline, they are problematic. Consider the field cookie_length, which represents the length of the cookie string. In our dataset, this field can take values from 0 to 7,532. So, the following filter:

cookie_length=42
Only targets requests with a cookie length of 42. And the next filter:

cookie_length=43
Will target another population.

To ease values management for sliceline, we require a special transformation. We need to binarize numerical features into buckets. We used the ContinuousOptimalBinning class from optbinning. It is not the only possible binner. We could have used scikit-learn KBinsDiscretizer or even built our own customized binner. After transformation, the cookie_length possible values become:

[0, 9.50[
[9.50, 1926.5[
[1926.5, +∞[
Which is much more practical and relevant for our slices definitions. Now that all the columns are categorical, we can apply sliceline.

Error Analysis

In detection, we can focus on different types of error:

  • All Errors (All predictions of our model that are wrong.)
  • False Positives (Humans detected as bots.)
  • False Negatives (Bots that we missed.)

Keep in mind that the label we are using is the decision of our current detection system. This label can be wrong without us knowing it.

Bot Human Labels ML

Results

For each kind of error, sliceline can identify the sub-populations where the model is performing significantly worse.

All Errors Mixed

Slice definition:

  • host: 43836151e20255b16f2368d290d305ff6c6d9436d9b352738c2ccf9f27494d43
    AND
  • user_agent: 42a98ff8e0c0edc2002cdc7d0ce11f8c7378713d9b25f22ce2d9461969f130db
    AND
  • country: FR

Model log loss on:

  • The full dataset (20,000 requests): 0.039
  • The selected slice (815 requests): 0.68

The slice mixes potential false positives and potential false negatives. It groups together 815 requests. This analysis is interesting to improve the overall model performance.

False Positives

Slice definition:

  • country: FR
    AND
  • headers: 598941606

Model log loss on:

  • The full dataset (20,000 requests): 0.039
  • The selected slice (76 requests): 0.73

This slice identifies 76 potential false positives requests, i.e. humans that have been blocked.

False Negatives

Best slice definition:

  • country: FR
    AND
  • host: 43836151e20255b16f2368d290d305ff6c6d9436d9b352738c2ccf9f27494d43
    AND
  • headers:-707806984

Model log loss on:

  • The full dataset (20,000 requests): 0.039
  • The selected slice (826 requests): 0.68

This slice identifies 826 potential false negatives requests, i.e. bots that have not been blocked.

Utilizing Results

Sliceline gives you filters identifying potential false positives and false negatives. You can leverage those results in different ways:

  • You can analyze the subpopulation found to understand the type of error: Which one is right between the ML model and the current detection system.
  • After analysis, you can also build new ML features based on sliceline rule fields in order to improve the model performance.
  • If you are using rules to stop bots or to allowlist humans, and if you are confident with the rule sliceline found, you can use the rule in your current detection system.

Sliceline Configuration

Sliceline is highly configurable. If you read the code of the previous examples we shared, you may have notice these parameters:

  • alpha
  • k
  • max_l
  • min_sup
  • verbose

Each parameter is useful in a certain way. Below is a broader definition complementary to the documentation.

alpha
If you are more interested in smaller subpopulations with higher average error, you can tune the alpha parameter. The closer to 1 alpha is, the smaller the subpopulations can be and so the higher the average error on those slices. You can reduce alpha’s value to get larger slices.
Note: There is a tradeoff between slice size and slice average error, and this tradeoff is controlled using alpha. Its value depends on what you are looking for.

k
The k parameter is pretty simple: How many slices do you want sliceline to output? Your answer to this question is k’s value. It is possible that in the computation, one or more slices get the same scores. In that edge case, slices with equal scores are both outputted.

max_l
The max_l parameter specifies the maximum number of fields used to define your slice. So, using the notation of (1), we have:

0 < N max_l

min_sup
The min_sup parameter stands for minimum support threshold. Inspired by frequent itemset mining, it ensures statistical significance by ignoring features’ modalities that are under-represented in your dataset.

verbose
This boolean parameter lets you log sliceline computations or not.

In our example, we used the following configuration:

  • alpha = 0.95: We are ready to get quite a small slice.
  • k = 1: We are only interested in the single top slice, i.e. the slice on which our model is performing the worst.
  • max_l = 5: Our slices will be defined by a maximum of 5 fields.
  • min_sup = 10: Fields’ modalities with less than 10 records will be ignored.
  • verbose = True: We want to see the logs of the computation.

Conclusion

We are coming to the end of this sliceline blog posts series. After presenting sliceline and applying it to toy datasets, we applied it on real traffic data in a security context: bot detection.

We built a machine learning model on data labeled by an already existing detection system. By analyzing both errors, sliceline showcased how powerful and highly configurable it was. Playing with its parameters will help you discover even more slices of interest for your problem to reduce either false positives or false negatives.

To explore the potential of our new ML model debugger, give it a try and give us your feedback:

*** This is a Security Bloggers Network syndicated blog from Blog – DataDome authored by Antoine de Daran, Cybersecurity Data Scientist. Read the original post at: https://datadome.co/threat-research/stopping-bots-with-sliceline-machine-learning/