SBN

How to Detect Attacks Using Coarse-Grained Features

At DataDome, we explore traffic data from different perspectives to gain a better understanding of how bad bots are hiding in plain sight. Coarse-grained features—that is, features that are broader in scope than usual—help us capture every customer’s context and detect distributed attacks that would go unnoticed if we only analyzed fine-grained features, like session or IP traffic. The attacks detected by coarse-grained features can be used by downstream systems and analysts to dig into the attack traffic and block it. Attack remediation techniques will be detailed in subsequent posts.

Traditional Detection Techniques

Bot detection relies heavily on the analysis of enriched HTTP requests, although other signals such as browser fingerprints and user behavior can be used to detect bots. You can look at enriched requests one by one, and also analyze the behavior of attackers by looking at wider temporal ranges. We usually analyze the behavior at the IP or session level and compute aggregations over time windows, such as the numbers of:

  • Requests.
  • Distinct user agents.
  • Distinct accept-language headers.

When running analysis at the single request level, we are looking for inconsistencies in the different attributes. This can detect simple bots, but more sophisticated bots can perfectly forge their HTTP headers using as databases of consistent request attributes. To detect those bots, we can resort to behavioral IP or session-based detection. In this way, we can detect if an IP/session is suspicious—but this cannot uncover bots that are using multiple sessions and/or IPs, which would appear as users only making a few requests per session.

Zooming Out

In order to get the global picture, we must zoom out—both in terms of time and the traffic features we analyze. Because every customer’s traffic is unique, we analyze the behavior at the customer level. This allows us to view coarse-grained features that go way beyond a particular IP or session and capture more context.

We compute several aggregations over small successive windows. The sequence of those aggregations forms a time-series, where every point is the result of the aggregation over a particular window.

Then, we employ time-series analysis to detect attacks and identify unique characteristics that identify the attackers.

Time Series Analysis

The most common aggregation we can compute is the number of requests. When computed for sessions or IPs, we can use them to apply rate-limiting-based detection.

However, behavior is not only characterized by the number of requests—so we compute customer-level aggregations, such as the numbers of:

  • Distinct countries.
  • Distinct sessions.
  • Autonomous systems.

Examples

Let’s have a look at some examples that demonstrate the effectiveness of coarse-grained features.

Number of requests in a given time window:

The first example is a highly distributed attack. We can see a peak in the traffic, but no significant change in the number of requests per IP. In such cases, rate limiting is not very useful, as the behavior of each IP seen in isolation is normal. In the second graph, we can see only two IPs present before the attack, and then several more IPs making a very small number of requests each when the attack starts.

Number of requests in a given time window

Number of requests in a given time window

Distinct user agents:

Another example is one where there is no clear peak in the traffic, but the number of distinct user agents allows us to clearly identify the attack. As you can see below, there is a variation in request count that could be attributed to normal fluctuations. But looking at the user agent (in red below) during the same time frame makes it clear that we have an attack.

Distinct user agents

Distinct user agents

Distinct countries:

Finally, we have an example where the crucial feature is the number of unique countries. We can see a small peak in the request count—but since the traffic is fluctuating, this is not always sufficient to identify the attack with confidence. We can see, however, that the number of unique countries has a clear peak that allows us to identify the attack and delimit the start and end time.

Distinct countries

Distinct countries

Conclusion

While a large part of bot detection involves looking at the finer features of each request, like behavior for each IP address or session, we can detect more sophisticated attacks using coarse-grained features like numbers of requests over time.

The first step in stopping bad bots is to find them, even if they’re hiding in “normal” traffic—and coarse-grained features help. DataDome’s powerful bot detection engine is constantly being improved as we analyze new data from incoming attacks on our customers’ websites.

If you are responsible for your organization’s online security or fraud prevention, you should see if you can spot any bots in your traffic patterns. To get a high-level view of your traffic and benchmark what is (and is not) “normal” user activity on your platform, check out DataDome’s bot and online fraud detection dashboard free for 30 days.

*** This is a Security Bloggers Network syndicated blog from DataDome authored by Konstantina Kontoudi, PhD, Lead Data Scientist. Read the original post at: https://datadome.co/threat-research/how-to-detect-attacks-using-coarse-grained-features/