Home » Security Bloggers Network » When Surging AI Referral Traffic is Actually Bad Bots

When Surging AI Referral Traffic is Actually Bad Bots

by Jérôme Segura on February 5, 2026

The post When Surging AI Referral Traffic is Actually Bad Bots appeared first on Blog – Datadome.

Remaining competitive as an online business now means becoming a primary source for generative AI. It is the new SEO holy grail. If ChatGPT is citing your content, it implies you are an authority, promising a wave of high-intent visitors looking to verify sources or buy products. So, when a web analytics dashboard shows a sudden vertical spike in traffic attributed to chatgpt.com, it is usually cause for celebration.

But that moment of triumph can quickly sour.

What looks like a flood of eager ChatGPT users validating your authority is sometimes nothing more than bots in disguise. Data visualization is only as reliable as the source feeding it, and threat actors know exactly how to manipulate referral headers to masquerade as legitimate AI traffic.

Case in point, DataDome’s Galileo Threat Research team recently observed a surge in referral traffic that appeared to be a flood of eager ChatGPT users but was anything but. What looked like hundreds of thousands of human visits was, in reality, bots built to bypass security filters. In this article, we explain what referral traffic is and take a deep dive into the attacks we saw impersonating ChatGPT.

Referral traffic

In the world of web analytics, referral traffic is the segment of visitors that arrives at your site through direct links on other domains, rather than from a search engine or paid ad. For years, this was dominated by blogs, news outlets, and social media platforms. But recently, a new major player has entered the arena: generative AI.

When a user asks ChatGPT a question about a product or service, the AI often includes citations or direct links to external websites. When a user clicks one of these links to verify a source or make a purchase, it is logged as referral traffic.

To help webmasters track this valuable audience, these visits are typically identified in two ways:

The “referer” header: A standard HTTP header that tells your server the visitor came from https://chatgpt.com/.
UTM parameters: Links generated by the AI often include tracking tags, such as &utm_source=chatgpt.com, allowing analytics platforms like Google Analytics (GA4) to categorize the visit as “AI Referral” traffic automatically.

ChatGPT search for shoe purchase

It is important to clarify that legitimate referral traffic represents human users clicking links within the ChatGPT interface, not the AI agent itself crawling your pages.

AI crawlers typically identify themselves transparently. ChatGPT, for example, validates its identity using publicly documented IP ranges and Web Bot Auth (cryptographically signed headers), in addition to its specific User-Agent strings.

Sticking out like a sore thumb

Consider the traffic pattern in the chart below related to ChatGPT referral visits across DataDome customers. We notice a significant spike around January 14th, followed by another, much smaller spike on the 20th.

Chart of referral traffic for ChatGPT

Around January 14th, activity spiked dramatically, peaking at nearly 600,000 requests. The primary source of this traffic was https://chatgpt.com/. On the surface, this implies that users were interacting with ChatGPT and clicking on links within the AI assistant’s interface.

While referral traffic from ChatGPT has increased over time, a sudden spike of this magnitude warrants skepticism. Bot traffic analysis showed that the “users” behind this spike were not actually human but bots. The image below isolates the malicious referral traffic requests from the overall trend we showed earlier.

Malicious referral traffic requests

The most damning evidence can be illustrated using a single session where we observed 109 individual product page requests within a span of just 5 seconds. That is roughly 22 requests per second, a velocity no human browser could achieve, indicating a highly aggressive scraper rather than an interested shopper.

Individual requests from the same session for product pages in a 5 sec timespan

The traffic avoided easy-to-block data centers. Instead, it routed through residential internet service providers (ISPs) like Comcast Cable (27%), Verizon Fios (17%), and AT&T (14%). This suggests the use of residential proxies to appear as legitimate home users.

Despite using residential connections, 50% of the traffic was identified as GNU/Linux. While Linux is common for servers and bot scripts, it is extremely rare for average consumer shopping traffic on ISPs like Comcast or Verizon.

Traffic breakdown

The “Referer” header

The HTTP Referer header (historically misspelled in the official RFC standard) is a simple text string sent by the client (the browser or script) to the server. Crucially, the client has absolute control over this string.

Bot developers know that many security firewalls (WAFs) and anti-scraping tools treat traffic from reputable domains like Google, Facebook, or ChatGPT with more leniency. Therefore, spoofing the source is a common tactic to bypass security filters.

To a scraper, coming from ChatGPT is as simple as adding a single line of code. In Python, it looks like this:

headers = {

    'Referer': 'https://chatgpt.com/',

    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)...'

}

requests.get('https://target-website.com/product-page', headers=headers)

By manually setting the header, a malicious bot transforms itself from a suspicious scraper into high-value AI traffic in the eyes of most analytics platforms.

The SEO blind spot: How junk traffic distorts reality

This kind of spoofing distorts your website analytics. It also exposes a fundamental flaw in how standard analytics tools process data. Platforms that typically ingest the “Referer” header without question will display skewed metrics.

This is where DataDome distinguishes itself from standard analytics or basic WAFs. While bots know that firewalls often treat traffic from reputable domains like ChatGPT with tolerance, DataDome does not rely on the “Referer” header as proof of legitimacy.

Instead, DataDome analyzes the behavior and intent behind the request. By detecting technical inconsistencies, such as a Linux device on a Comcast residential IP executing 100+ requests in seconds, we filter out the noise before it distorts your data and leads to incorrect conclusions about where your traffic is coming from.

The AI era calls for better detection

In the age of automated agents, the “Referer” header is often just a mask. Security and analytics teams can no longer afford to take these signals at face value. When a sudden, unexplained spike appears from a major platform, manual verification is often too slow and complex to be effective.

This is why DataDome’s approach is critical. By looking behind the mask to see who is actually knocking at the door, DataDome validates the legitimacy of the traffic in real-time by leveraging multi-layered, intent-based detection—so you can protect both your infrastructure and the integrity of your business intelligence.

DataDome is built for businesses that want to stay ahead in the agentic AI era, seizing new opportunities while minimizing fraud risks. As AI agents become more sophisticated and widespread, protecting your digital infrastructure has never been more critical.

Want to see if a spoofed AI agent can access your website right now? Use DataDome’s Vulnerability Scan to find out.

*** This is a Security Bloggers Network syndicated blog from Blog – DataDome authored by Jérôme Segura. Read the original post at: https://datadome.co/threat-research/when-surging-ai-referral-traffic-is-actually-bad-bots/

February 5, 2026April 14, 2026 Jérôme Segura AI, bot management, Retail & e-commerce, Threat Research

When Surging AI Referral Traffic is Actually Bad Bots

Referral traffic

Sticking out like a sore thumb

The “Referer” header

The SEO blind spot: How junk traffic distorts reality

The AI era calls for better detection

Senator Sanders Wants to Own AI Companies — and Hand America’s Adversaries the Keys

NIST’s Nine: The PQC Signature Race Moves to Round Three

The Quantum Arms Race: Why Washington Just Wrote a $2 Billion Check to Nine Companies

Beyond Moore’s Law: The Hyper-Acceleration of Autonomous AI Cyber Capabilities

The Exception Economy: When Security Teams Stop Protecting and Start Negotiating

GoPlus’s Latest Report Highlights How Blockchain Communities Are Leveraging Critical API Security Data To Mitigate Web3 Threats

C2A Security’s EVSec Risk Management and Automation Platform Gains Traction in Automotive Industry as Companies Seek to Efficiently Meet Regulatory Requirements

Zama Raises $73M in Series A Lead by Multicoin Capital and Protocol Labs to Commercialize Fully Homomorphic Encryption

RSM US Deploys Stellar Cyber Open XDR Platform to Secure Clients

ThreatHunter.ai Halts Hundreds of Attacks in the past 48 hours: Combating Ransomware and Nation-State Cyber Threats Head-On

Fortinet® Follies