SBN

Is PhantomJS dead? Detecting PhantomJS headless browsers.

First developed by Ariya Hidayat in 2011, PhantomJS is one of the first popular headless browsers. It provides a convenient JavaScript (JS) API to interact with websites—clicking buttons, filling out forms—without having to spawn a real browser, and utilizes QtWebkit for the backend. And while PhantomJS has been used for testing purposes, it has also been used to make malicious bots.

PhantomJS stopped being maintained in 2018—which coincided with the release of Headless Chrome by Google. At release, Headless Chrome was nearly on par with vanilla/headful Chrome and supported almost all of the same advanced features.

Because Headless Chrome was so effective, most bot developers migrated to using it—especially when Google also released Puppeteer, a high-level automation framework to automate (Headless) Chrome.

Is PhantomJS still popular?

Despite the popularity of Puppeteer and Headless Chrome, we wondered if PhantomJS was still being used by bot developers after almost five years. To find out, we used a few signatures to identify traffic linked to PhantomJS, leveraging both server-side and client-side signals.

On the server side, we look for user-agents with a PhantomJS substring, such as Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) PhantomJS/2.1.1 Safari/538.1.

On the client side, we identify the presence of window.callPhantom and window._phantom.

To detect potentially modified PhantomJS (all while being relatively sure it’s PhantomJS and not another bot framework like Selenium, for example) we also check for:

  • The presence of plugins—which PhantomJS does not have—in navigator.plugins.
  • Whether navigator.language = C (as in the programming language).
  • Whether the browser vendor (navigator.vendor) is “Apple Computer, Inc.”

Thus, even if PhantomJS modified its user agent and also got rid of window.callPhantom and window._phantom we can still be relatively certain that the fingerprint is linked to modified PhantomJS.

Of course, bots can also lie about their attributes. They can still try to use JavaScript to remove the presence of key attributes or forge the usual PhantomJS fingerprint, e.g. navigator.language = C, or the absence of navigator.plugins. But here, the idea is to have a rough idea of whether PhantomJS is still actively used by bot developers in the wild. 

Note that our JS tests also catch other bot frameworks like SlimerJS—which is like PhantomJS, but based on Gecko, a browser engine developed by Mozilla for the Firefox browser.

The graph below shows the number of PhantomJS requests per hour during a one-week period. On average, we observe ~15K requests coming from bots based on PhantomJS per hour.

PhantomJS requests

Most of the PhantomJS bots we observe operate on websites where only our detection is activated, but our protection is not activated (this is typically the case for sites running DataDome’s free trial).

In addition to free trials, websites without protection include customers who have not activated the protection on specific endpoints, or who prefer to manage protection themselves using their own logic, e.g. by showing fake data to scrapers.

In particular, the spike of PhantomJS traffic we observed on January 5–6 occurred on a website using DataDome to enrich their anti-fraud data in non-filtering mode.

What are PhantomJS bots doing?

The majority of PhantomJS bots are used to conduct scraping attacks. Outside of scraping, most PhantomJS bots are either linked to internal customer traffic (used for testing purposes) or verified/commercial bots used to gather information and verify ads.

On average, once we exclude good bots and internal traffic, we observe ~15K malicious PhantomJS bot requests/hour. While this can be surprising, popular proxy providers like Brightdata are still publishing blog posts about PhantomJS.

How popular is PhantomJS vs. Puppeteer Extra Stealth?

When we studied Puppeteer Extra Stealth, a modified version of Puppeteer that changes its fingerprint to avoid traditional bot detection techniques, we detected on average ~650K requests every three hours, which is ~215K requests/hour.

Thus, when it comes to bad bots, we observe ~15x more Puppeteer Extra Stealth bot requests than PhantomJS bot requests. 

Puppeteer-extra-plugin-stealth request count vs. timestamp graph.

Conclusion

PhantomJS was one of the first popular/mainstream headless browsers, but it stopped being maintained after Headless Chrome and Puppeteer were released in 2018. Even so, we still see PhantomJS in the wild. It’s generally used internally by companies to conduct testing, as well as by a few commercial bots to gather information.

When it comes to bad bots, most of them use PhantomJS to conduct web scraping. However, the volume of requests is quite low: ~15K requests per hour, which ~15x less than the volume of bot requests linked to Puppeteer Extra Stealth.

*** This is a Security Bloggers Network syndicated blog from Blog – DataDome authored by Antoine Vastel, PhD, Head of Research. Read the original post at: https://datadome.co/threat-research/is-phantomjs-dead/

Secure Guardrails