When Surging AI Referral Traffic is Actually Bad Bots
The post When Surging AI Referral Traffic is Actually Bad Bots appeared first on Blog – Datadome.
Remaining competitive as an online business now means becoming a primary source for generative AI. It is the new SEO holy grail. If ChatGPT is citing your content, it implies you are an authority, promising a wave of high-intent visitors looking to verify sources or buy products. So, when a web analytics dashboard shows a sudden vertical spike in traffic attributed to chatgpt.com, it is usually cause for celebration.
But that moment of triumph can quickly sour.
What looks like a flood of eager ChatGPT users validating your authority is sometimes nothing more than bots in disguise. Data visualization is only as reliable as the source feeding it, and threat actors know exactly how to manipulate referral headers to masquerade as legitimate AI traffic.
Case in point, DataDome’s Galileo Threat Research team recently observed a surge in referral traffic that appeared to be a flood of eager ChatGPT users but was anything but. What looked like hundreds of thousands of human visits was, in reality, bots built to bypass security filters. In this article, we explain what referral traffic is and take a deep dive into the attacks we saw impersonating ChatGPT.
Referral traffic
In the world of web analytics, referral traffic is the segment of visitors that arrives at your site through direct links on other domains, rather than from a search engine or paid ad. For years, this was dominated by blogs, news outlets, and social media platforms. But recently, a new major player has entered the arena: generative AI.
When a user asks ChatGPT a question about a product or service, the AI often includes citations or direct links to external websites. When a user clicks one of these links to verify a source or make a purchase, it is logged as referral traffic.
To help webmasters track this valuable audience, these visits are typically identified in two ways:
- The “referer” header: A standard HTTP header that tells your server the visitor came from https://chatgpt.com/.
- UTM parameters: Links generated by the AI often include tracking tags, such as &utm_source=chatgpt.com, allowing analytics platforms like Google Analytics (GA4) to categorize the visit as “AI Referral” traffic automatically.

It is important to clarify that legitimate referral traffic represents human users clicking links within the ChatGPT interface, not the AI agent itself crawling your pages.
AI crawlers typically identify themselves transparently. ChatGPT, for example, validates its identity using publicly documented IP ranges and Web Bot Auth (cryptographically signed headers), in addition to its specific User-Agent strings.
Sticking out like a sore thumb
Consider the traffic pattern in the chart below related to ChatGPT referral visits across DataDome customers. We notice a significant spike around January 14th, followed by another, much smaller spike on the 20th.

Around January 14th, activity spiked dramatically, peaking at nearly 600,000 requests. The primary source of this traffic was https://chatgpt.com/. On the surface, this implies that users were interacting with ChatGPT and clicking on links within the AI assistant’s interface.
While referral traffic from ChatGPT has increased over time, a sudden spike of this magnitude warrants skepticism. Bot traffic analysis showed that the “users” behind this spike were not actually human but bots. The image below isolates the malicious referral traffic requests from the overall trend we showed earlier.

The most damning evidence can be illustrated using a single session where we observed 109 individual product page requests within a span of just 5 seconds. That is roughly 22 requests per second, a velocity no human browser could achieve, indicating a highly aggressive scraper rather than an interested shopper.

The traffic avoided easy-to-block data centers. Instead, it routed through residential internet service providers (ISPs) like Comcast Cable (27%), Verizon Fios (17%), and AT&T (14%). This suggests the use of residential proxies to appear as legitimate home users.
Despite using residential connections, 50% of the traffic was identified as GNU/Linux. While Linux is common for servers and bot scripts, it is extremely rare for average consumer shopping traffic on ISPs like Comcast or Verizon.

The “Referer” header
The HTTP Referer header (historically misspelled in the official RFC standard) is a simple text string sent by the client (the browser or script) to the server. Crucially, the client has absolute control over this string.
Bot developers know that many security firewalls (WAFs) and anti-scraping tools treat traffic from reputable domains like Google, Facebook, or ChatGPT with more leniency. Therefore, spoofing the source is a common tactic to bypass security filters.
To a scraper, coming from ChatGPT is as simple as adding a single line of code. In Python, it looks like this:
headers = {
'Referer': 'https://chatgpt.com/',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)...'
}
requests.get('https://target-website.com/product-page', headers=headers)
By manually setting the header, a malicious bot transforms itself from a suspicious scraper into high-value AI traffic in the eyes of most analytics platforms.
The SEO blind spot: How junk traffic distorts reality
This kind of spoofing distorts your website analytics. It also exposes a fundamental flaw in how standard analytics tools process data. Platforms that typically ingest the “Referer” header without question will display skewed metrics.
This is where DataDome distinguishes itself from standard analytics or basic WAFs. While bots know that firewalls often treat traffic from reputable domains like ChatGPT with tolerance, DataDome does not rely on the “Referer” header as proof of legitimacy.
Instead, DataDome analyzes the behavior and intent behind the request. By detecting technical inconsistencies, such as a Linux device on a Comcast residential IP executing 100+ requests in seconds, we filter out the noise before it distorts your data and leads to incorrect conclusions about where your traffic is coming from.
The AI era calls for better detection
In the age of automated agents, the “Referer” header is often just a mask. Security and analytics teams can no longer afford to take these signals at face value. When a sudden, unexplained spike appears from a major platform, manual verification is often too slow and complex to be effective.
This is why DataDome’s approach is critical. By looking behind the mask to see who is actually knocking at the door, DataDome validates the legitimacy of the traffic in real-time by leveraging multi-layered, intent-based detection—so you can protect both your infrastructure and the integrity of your business intelligence.
DataDome is built for businesses that want to stay ahead in the agentic AI era, seizing new opportunities while minimizing fraud risks. As AI agents become more sophisticated and widespread, protecting your digital infrastructure has never been more critical.
Want to see if a spoofed AI agent can access your website right now? Use DataDome’s Vulnerability Scan to find out.
*** This is a Security Bloggers Network syndicated blog from Blog – DataDome authored by Jérôme Segura. Read the original post at: https://datadome.co/threat-research/when-surging-ai-referral-traffic-is-actually-bad-bots/

