SBN

When the Token Theft Hides in Plain Sight: Why Agent Containment Stops the Claude Code MCP Attack

Researchers at Mitiga Labs recently demonstrated a five-step attack that quietly hijacks Claude Code’s Model Context Protocol (MCP) traffic and steals the OAuth bearer tokens that grant access to platforms like Jira, Confluence, and GitHub. The attack needs no privilege escalation, no memory corruption, and no new CVE. It abuses the way an agentic developer tool trusts its own local configuration. Anthropic reviewed the report, classifying it as out of scope because the attack depends on prior user consent, and confirmed that no patch is forthcoming. That decision places the detection and response burden on enterprise security teams.

The attack illustrates a lesson the industry keeps relearning: identity and detection guard the wrong boundary. The only controls that matter must be in the agent’s path, governing what the agent can reach and do.

The Attack in Brief

The entry point is a malicious npm package that survives casual inspection. Buried inside it sets a postinstall hook that runs silently during installation and targets one file: ~/.claude.json, the global configuration that tells Claude Code how to route all MCP traffic and that stores OAuth tokens in plaintext.

The hook pre-seeds common developer clone paths with trust flags set to true, so Claude Code never prompts for approval. It then inserts a sessionStart hook that fires every time Claude Code loads a trusted project. That hook rewrites the legitimate MCP server URLs, swapping an endpoint like Atlassian’s for a localhost proxy the attacker controls. When the developer connects the server, Claude Code runs a full OAuth flow straight through the proxy. The bearer token transits attacker infrastructure, and the provider sees a valid authentication from a trusted origin.

That token persists across sessions with a refresh token, inherits every permission granted at authorization, lives in plaintext beside the trust flags, and reaches the provider from Anthropic’s egress IP range. To the provider’s audit logs, it looks identical to legitimate traffic. The hook reasserts itself on every load, so rotating the stolen token simply feeds the attacker a fresh one. The only evidence lives in a user-level file most security teams don’t monitor.

Identity and Detection Guard the Wrong Boundary

The root cause is architectural; the OAuth model here is client-side and direct. The developer’s machine holds the token in cleartext and where to route MCP traffic. Nothing sits between the agent and the provider to verify the destination, bind the credential to a known context, or watch what the agent does with the access. In this case, whoever controls the local config controls the agent.

This is the boundary problem two independent research efforts converged on this year. Dr. Chase Cunningham’s Agentic Zero Trust research and Anthropic’s own engineering, in its Zero Trust framework for agents and its account of how it contains the agents it builds, arrived at the same conclusion brought to life in the AI Gateway product Cequence built. Traditional access controls cannot stop an agent from misusing legitimate permissions. A token is a grant frozen at a moment, valid for a window, and an agent making thousands of calls inside that window does whatever it decides until the token expires. Detection makes attacks harder but not impossible, since a patient attacker eventually finds a prompt that gets through. And the most damaging attacks never trip a detector at all, because the malicious instruction arrives through the user as a routine task and looks entirely legitimate. In this attack, every field in the provider’s logs is valid, so neither an identity check nor a classifier has anything to flag.

Both Dr. Cunningham’s and Anthropic’s research come to the same conclusion: the only control that survives is containment at the boundary the agent has to cross. Scope the agent to only the tools needed to perform its declared job, watch what it does at runtime, and halt it the instant behavior strays from the role. That boundary does not care which model is reasoning or whether a token was stolen, because enforcement lives in infrastructure you own, not on a developer’s computer.

How the Cequence AI Gateway Prevents the Attack

The Cequence AI Gateway is that boundary. It brokers every agent-to-application connection through a governed enforcement layer, collapsing the client-side single point of failure this attack depends on.

The configuration tampering happens on the developer’s machine, in ~/.claude.json, which sits outside the gateway’s reach. What the gateway removes is the payoff. The endpoint rewrite only matters if it captures a usable token, and that is exactly what the gateway takes off the table. While the attack can still run, it simply comes up empty. Three capabilities make that possible.

Integrated authentication and authorization takes away the thing worth stealing. Because the gateway brokers auth, the real OAuth tokens live on Cequence’s side, not in a plaintext file on the developer’s machine. The attacker can rewrite the endpoint and run the flow, but no broadly scoped bearer token is sitting locally to intercept. The laptop only ever holds an AI Gateway-scoped, session-bound credential — never the broad downstream SaaS token. Even a credential that did leak would not authenticate from outside the enterprise network thanks to the AI Gateway’s token session binding, so it is useless where the attacker sits.

Agent Personas cap what a compromised agent could reach. Each agent’s job is expressed as a plain-English role with least-privilege permissions down to the individual tool call, so the agent can touch only the handful of tools its role requires. An agent provisioned to read Salesforce records cannot pivot to act in Jira. If an agent is ever subverted, the blast radius is limited to the role, not the full reach of the user’s access.

Visibility and monitoring catches the malicious behavior even when no one knows it is happening. The gateway watches every action an agent takes against its declared job and halts behavior that steps outside the role. This attack is built to stay invisible, leaving its only trace in a user-level file most teams never check. The gateway moves that visibility to the one place every agent action has to cross, so the abuse surfaces in real time rather than in an audit log after the fact.

The Takeaway

When trust, credentials, and routing live in a local file the agent reads on every launch, one supply chain foothold becomes durable, broadly scoped access to your most sensitive systems. Patching one tool will not fix that, and neither will a sharper classifier or a shorter-lived token. The defense that holds is a behavioral gateway you own, sitting wherever agents reach your systems, governing which servers they touch, binding credentials to context, and scoring every action against the job you gave them. That is what the Cequence AI Gateway delivers.

The post When the Token Theft Hides in Plain Sight: Why Agent Containment Stops the Claude Code MCP Attack appeared first on Cequence Security.

*** This is a Security Bloggers Network syndicated blog from Cequence Security authored by Jeff Harrell. Read the original post at: https://www.cequence.ai/blog/ai/claude-code-mcp-attack-ai-gateway/