Home » Security Bloggers Network » How to CAPTCHA Confidence: DataDome’s CAPTCHA Results Prove Promising

How to CAPTCHA Confidence: DataDome’s CAPTCHA Results Prove Promising

by Antoine Vastel, PhD, Head of Research on March 7, 2023

The post How to CAPTCHA Confidence: DataDome’s CAPTCHA Results Prove Promising appeared first on Blog – Datadome.

In 2022, DataDome announced the availability of our own secure, privacy compliant, and user-friendly CAPTCHA, integrated with our complete bot and online fraud protection. Now, we are excited to share data that reveals how our CAPTCHA has performed thus far…

DataDome’s CAPTCHA was designed to overcome the limitations of traditional, siloed CAPTCHAs. Legacy CAPTCHAs that rely solely on the complexity of the challenge to secure online businesses against bots have several drawbacks that expose organizations and customers to risk:

CAPTCHAs become less accessible for legitimate human users as the challenges become more complex.
Challenges severely degrade the user experience (UX) when shown to real users.
Research published at Usenix Security 2021 shows that even really complex 3D CAPTCHAs can be solved by advanced neural networks architecture (bots).
Not all people think the same. Our global customer base helped us understand the need for solutions that are language independent and easy to understand, no matter your age or culture.

That’s why we decided to adopt a new paradigm when designing our CAPTCHA. Instead of relying solely on the difficulty of a CAPTCHA challenge (which would negatively affect UX), we added an extra layer of security using invisible signals.

DataDome leverages our bot detection expertise to collect browser fingerprints, behavioral signals, and reputational signals. Each signal—along with its processing—is completely invisible to human users, allowing us to guarantee our CAPTCHA supports a smooth and straightforward UX for genuine users, while fraudsters face their worst nightmare.

In fact, DataDome customers targeted by CAPTCHA bots (bots created to forge or solve CAPTCHAs) saw significant detection improvements after activating the DataDome CAPTCHA, with a decrease in CAPTCHAs passed up to 80%.

DataDome CAPTCHA Around the World

As of today, the DataDome CAPTCHA is helping to protect hundreds of websites and applications across various industries: from e-commerce and transportation to gambling and classifieds.

Contrary to traditional CAPTCHAs that challenge 100% of users and typically require their valuable time, the DataDome CAPTCHA can be solved in under two seconds on the extremely rare occasion a human is challenged (1 in 10,000). DataDome monitors all requests in the background, and the CAPTCHA is only shown if the detection engine suspects the request is coming from a bot.

In other words, DataDome’s false positive rate (percentage of the time a CAPTCHA is shown to a human) is less than 0.01%. And thanks to our 25 points of presence (PoPs) and zero added latency, the DataDome CAPTCHA loads in less than one second on average.

What results did we CAPTCHA?

When we designed our own CAPTCHA, we wanted to:

Improve privacy for human users by only collecting and using signals for security purposes.
Provide a better UX to human users in the rare case they get shown a CAPTCHA.
Address the security limitations of traditional CAPTCHAs (including CAPTCHA farms and image/audio recognition AI techniques used by bots).

We knew attackers would start to adapt once our CAPTCHA was used on popular websites and mobile applications, particularly the platforms of leading enterprises frequently targeted by all kinds of attackers. That’s why we integrated ML models that leverage different types of signals to reinforce our security without impacting user experience.

Some of the primary signals we use are:

Browser Fingerprints: Collected by the DataDome CAPTCHA using JavaScript, browser fingerprinting enables DataDome to detect the main automation frameworks and headless browsers used by bot developers.
Behavioral Signals: Signals such as mouse movements and keystrokes (anonymized) tell our solution if users are interacting abnormally with the DataDome CAPTCHA.
Reputational Signals: Signals such as IP and session reputation, as well as residential proxy detection, are used to flag suspicious activity, as residential and ISP proxies are commonly used by fraudsters to scale their attacks.

DataDome CAPTCHA also includes natively integrated features to combat CAPTCHA farms (the details of which will remain a mystery for security reasons).

The graph below represents the number of CAPTCHA views (green), all passing attempts (purple), successful passing attempts (light blue), and malicious passing attempts (orange) per 3 hours on websites and mobile applications protected by DataDome.

Graph: CAPTCHA Passing Attempts, including malicious

The number of CAPTCHA views corresponds to the number of users (human and bot) that have seen a DataDome CAPTCHA. Note that not all bots visualize CAPTCHAs when they get one as it requires JavaScript execution. Moreover, some bots just stop when they get a 403 response, only to try their attack from another IP address or using a different fingerprint.

We observe a constant stream of malicious bots attempting to forge CAPTCHAs: ~85K malicious CAPTCHA passing attempts every three hours.

If we look at the customers most impacted by CAPTCHA bots, we see that all industries and verticals are targeted. From e-commerce and classifieds to retail and transportation, bots are actively trying to bypass CAPTCHAs.

Graph: CAPTCHA by industry

When we analyzed the main signals that helped us catch bots trying to forge the DataDome CAPTCHA, we were able to distinguish three categories:

Behavioral Detection (blue line): Client-side behavioral events such as mouse movements or anonymized keystrokes.
Browser Fingerprints (red line): Signals collected in JavaScript in the CAPTCHA background.
Forged CAPTCHA Payloads (green line): When attackers tried to reverse-engineer the CAPTCHA payload and send CAPTCHA responses without executing it as intended.

Graph: How we caught forged CAPTCHAs

Note that some ML detection models and rules leverage different signals that can belong to multiple categories (fingerprinting, behavior, reputation, forgery detection, etc.). Thus, some CAPTCHA passing attempts may be counted in multiple categories.

The fact that the number of forged payloads (green line) decreased doesn’t mean attackers stopped forging payloads—but simply that they changed the way they forged payloads. These changes resulted in inconsistent client-side behavioral events (mouse movements, in this case). Thus, after the beginning of February 2023, the attackers were mostly caught using behavioral detection signals.

Moreover, we have observed that when new detection models are deployed on DataDome CAPTCHA, some of the most sophisticated attackers (e.g. some bots as a service or large-scale scrapers targeting several customers) try to quickly adapt to increase their chance of passing the CAPTCHA by generating more realistic mouse movements, for example.

Is accessibility used as a weakness by attackers?

Attackers often try to leverage audio CAPTCHA using AI-based audio recognition techniques to forge CAPTCHAs more easily, compared to image-based CAPTCHAs.

While audio CAPTCHAs represent ~2.5% of total DataDome CAPTCHAs passed by legitimate humans, they represent ~20.5% of all malicious CAPTCHA passing attempts. This shows that attackers are, as suspected, trying to exploit accessibility features to bypass detection.

Graph: Malicious Audio and Visual CAPTCHA Attempts

How are bad bots attacking?

Using JavaScript signals collected by the DataDome CAPTCHA, we can usually infer the technologies and techniques used by bad bots. (Note that it is not always possible to identify the underlying instrumentation frameworks used by bots, in particular when bot developers apply custom fingerprinting changes.)

Vanilla Puppeteer

We saw bots using vanilla Puppeteer with few fingerprint changes. Most of them don’t have navigator.webdriver = true as they’re certainly using the –disable-blink-features=AutomationControlled flag, but they can still be reliably identified as Puppeteer using other fingerprinting techniques—for example by detecting leaks in the JavaScript stack traces, such as: at pptr://__puppeteer_evaluation_script__:2:24.

Puppeteer Extra Stealth

Puppeteer extra stealth is also heavily used by attackers. It can be identified using browser fingerprint inconsistencies linked to the framework. A fraction of attackers with this framework tend to forge more realistic mouse movements using the page.mouse.move API of Puppeteer—where you can provide extra parameters like the number of steps on a mouse move to slow down the move and appear more human—or more specialized packages, such as Ghost-cursor.

Selenium

Selenium browsers are still used with notable regularity by bad bots, despite Puppeteer and its related frameworks being far more popular with sophisticated bots. We also see modified Selenium browsers that try to hide their presence by removing common fingerprinting attributes.

Browser Automation Studio

We observe a significant use of the browser automation studio tool, which enables bot developers to create advanced bots without needing advanced programming skills.

The tool also natively integrates with CAPTCHA farm providers to solve legacy/traditional CAPTCHAs, and provides modules to generate human-like mouse movements and browser fingerprints.

browser automation studio addons

When it comes to CAPTCHAs, we observe a lot of bots—mostly scrapers—using browser extensions like 2captcha to attempt to solve CAPTCHAs. Even though 2captcha doesn’t work on DataDome CAPTCHA, a lot of bots still integrate it into their stack in case they encounter traditional CAPTCHA.

Lastly, several small scale custom scrapers are developed using extensions like Tamper monkey that make it easy to inject and execute custom JavaScript code on pages. Thus, fraudsters can turn their own browser into a low-scale scraper running on their own machine.

Conclusion

Bots are continuously trying to forge CAPTCHAs, using different technologies and techniques. Traditional CAPTCHAs are not well-suited against sophisticated bots. Several bot frameworks and automation tools natively ship with CAPTCHA farm integration and AI image/audio recognition techniques to forge CAPTCHAs.

Moreover, if you only use a traditional CAPTCHA to protect critical parts of your website and/or mobile app (such as the payment endpoint), you introduce a single point of failure that can be bypassed by advanced bots.

That’s why DataDome changed the paradigm when we designed our CAPTCHA. Instead of relying solely on the difficulty of a CAPTCHA challenge, we added extra layers of security using invisible signals. Our approach enables us to successfully stop millions of malicious CAPTCHA passing attempts, all while supporting a smooth user experience—even in the 0.01% chance a human sees the CAPTCHA.

The more detection signals DataDome collects, the less we want to keep the data to ourselves. Therefore, to show you the value of DataDome CAPTCHA in real time, we will be releasing a funnel view and deep-dive view in the DataDome dashboard! Stay tuned for more news about the new features, coming soon.

*** This is a Security Bloggers Network syndicated blog from DataDome authored by Antoine Vastel, PhD, Head of Research. Read the original post at: https://datadome.co/learning-center/datadome-captcha-results/