Home » Security Bloggers Network » How TLS Fingerprinting Reinforces DataDome’s Protection

How TLS Fingerprinting Reinforces DataDome’s Protection

by Omar Tafsi, R&D Engineer on October 4, 2022

The post How TLS Fingerprinting Reinforces DataDome’s Protection appeared first on Blog – Datadome.

In the world of bot detection, we have to stay one step ahead of attackers.
We are constantly on the lookout for new fingerprinting methods to detect bots more accurately and effectively.

When trying to establish an HTTPS connection with a server, mobile apps and browsers have to share certain metadata about what they can and can’t support in terms of security algorithms in what is known as the “TLS handshake.” The goal of the TLS handshake is to eventually agree on specific algorithms both parties support, which depend on a variety of characteristics, such as the operating system (OS), browser, etc.

It has been shown by Lee Brotherston’s 2015 research that the TLS fingerprint is quite stable and unique for a device type, browser, and OS, and can detect both:

Users that lie about their identities by having inconsistencies (often bots).
Specifically known bots that have unique TLS metadata.

Let’s explore how DataDome makes use of TLS fingerprinting to detect the next major attack on our customers’ platforms. Keep reading to find out:

What is TLS?

Transport Layer Security (TLS), the successor of the now-deprecated Secure Sockets Layer (SSL), is the cryptographic protocol designed to provide communication security over a computer network—namely, layer 7 HTTPS connections.

To ensure a secure connection, TLS uses an arsenal of cryptographic algorithms that fall into the following categories:

Key Exchange Algorithms: e.g. ECDHE
Authentication/Digital Signature Algorithms: e.g. ECDSA
Bulk Encryption Algorithms: e.g. AES
Message Authentication Code Algorithms Hashes: e.g. MD5

What is a TLS fingerprint?

In the field of security, a fingerprint is a unique set (per user, per device, or per device class) of attributes that must be stable enough for re-identification purposes.

The algorithms used by TLS have to be negotiated as part of the TLS handshake before communication. The handshake usually comes in the form of a list containing supported ciphers-suites (a combination of a TLS key exchange, algorithm authentication, digital signature algorithm bulk, encryption algorithm messages, and authentication code algorithms) and extensions (parameters used by the ciphers) from which the client and the server choose a cipher-suite and use it to secure the connection.

The values of the supported cipher-suites, extensions, and other values in the handshake are received by the server and then hashed separately or in a concatenated form to provide an easy-to-use hash value. DataDome’s detection models can use the hash value, which we call a TLS fingerprint, to block attacks. The most well-known TLS fingerprint format is ja3 (see example below from Salesforce engineering).

ja3 TLS Fingerprinting

How does TLS fingerprinting work?

TLS fingerprinting relies on the following assumption: Different HTTPS clients (browsers, computer programs, etc.) support different algorithms. This means, from the list of supported ciphers and extensions that a client shares with the server during the TLS handshake, we can identify which device class they belong to. In other words:

TLS Client Hello (Chrome on iOS) ≠ TLS Client Hello (Chrome on Android)
TLS Client Hello (Chrome on iOS) ≠ TLS Client Hello (Firefox on iOS)
TLS Client Hello (Googlebot) ≠ TLS Client Hello (Bingbot)

TLS Client Hello (regular Chrome 94) will stay stable as long as the user has not changed anything in their browser/OS.

How We Leverage Fingerprinting: TLS Fingerprinting Models

Our machine learning (ML) models use features extracted from the TLS fingerprint signals we collect to detect malicious traffic associated with them and generate blocking patterns.

For each of the analyzed TLS fingerprint signals, we collect the following features:

Percentage of bots associated with it.
Quality of IPs that sent it.
Operating system (OS) associated with it.
Name of the browser and its version in traffic matching it.

When evaluating the performance of our models, we have noticed they identify two different patterns of malicious traffic using TLS fingerprints:

Malicious traffic that can be directly identified by its unique TLS fingerprint. Some bots are programmed in a way that causes them to have a TLS fingerprint that differs from any other fingerprint out there, making them easily identifiable.
Malicious traffic that has an inconsistent combination of TLS fingerprint and device class (OS/browser name/browser version). For example, if an attacker pretends to be on Chrome 102 but doesn’t have the corresponding TLS fingerprint that this browser typically has.

Example of an Attack Blocked by TLS Fingerprinting

This graph depicts an attack that occurred on September 2, 2022.

Line Graph Requests Per 15 Min TLS Fingerprinting

The attackers started with a relatively small amount of requests and increased their aggressiveness the following day after being blocked. 2.1M requests were sent, with the highest rate being over 160K requests per hour. The attack targeted a specific path in one of our e-commerce customers’ sites.

This attack was well-distributed—in total, more than 100k distinct IP addresses belonging to more than 80 countries were used:

Map TLS Fingerprinting

Below is a pie chart of the type of autonomous system (AS) the attacker IPs belonged to. More than 85% of the IPs used by attackers belonged to residential ASes, followed by mixed ASes, and then data center ASes that represent only a small fraction:

Pie Chart TLS Fingerprinting

The attack was highly distributed. To get a clearer idea of the distribution, we plot the number of unique IP addresses used during the attack. Attackers use IP distribution to bypass classic bot detection techniques such as geoblocking and IP block-listing:

Number Unique IPs Per 15 Min TLS Fingerprinting

Our TLS-based models managed to block this attack by noticing the attackers had a TLS cipher suite hash corresponding to HTTPS clients running on iOS, which was different from the OS the attackers were using. As a result, they were flagged and our engine blocked the attack.

Diversifying your fingerprinting methods with certain client-side data (like TLS fingerprints) can help detect attacks that may not have any server-side inconsistencies, which wouldn’t be possible with classic server-side fingerprinting methods.

Conclusion

The TLS fingerprint is a signal extracted from data shared during the initial handshake between an HTTPS client and a server, and usually contains device information that is obscure or hard to forge. TLS fingerprinting is a solid method in the arsenal of bot detection tools we use daily to block attackers from breaching our customers’ data and exploiting them for malicious purposes.

Using our ML models, we detect inconsistencies on never-before-seen TLS fingerprints by leveraging signals extracted from other fingerprinting methods, enabling us to block large-scale malicious requests on all our client’s platforms in real time. TLS fingerprinting is one reliable signal among the three trillion signals per day our detection engine uses to identify fraudulent web traffic.

Using fingerprints is just one of the ways DataDome protects websites, mobile apps, and APIs without affecting the end user, who never even knows we’re there.

*** This is a Security Bloggers Network syndicated blog from DataDome authored by Omar Tafsi, R&D Engineer. Read the original post at: https://datadome.co/engineering/how-tls-fingerprinting-reinforces-datadomes-protection/