Google DeepMind Treats Advanced AI as ‘Insider Threats’ in New Cybersecurity Roadmap
Google DeepMind has introduced a security framework that treats artificial intelligence (AI) agents not just as software tools but as potential security threats.
The company’s newly published “AI Control Roadmap” outlines a proactive strategy to contain, monitor, and actively police AI systems that could bypass human oversight, exfiltrate data, or quietly sabotage their assigned tasks.
The 35-page technical report introduces a framework called TRAIT&R (Taxonomy of Rogue AI Tactics and Routines), modeled closely after the cybersecurity industry’s standard MITRE ATT&CK database. The move illustrates a departure from the AI safety community’s traditional, exclusive focus on the “alignment problem,” the effort to permanently program human ethics and intentions into AI.
DeepMind officials acknowledge that while alignment remains a priority, it may never be fully solved, necessitating a zero-trust, layered defense.
“The first line of defense is always to align the AI systems, but it’s always good to have multiple layers,” said Rohin Shah, lead of the AGI safety and alignment team at DeepMind. “We borrow a lot from security, which already deals with the threat of internal employees who might be malicious, and we can apply these to a new setting.”
The TRAIT&R taxonomy identifies three threat categories to defend against: an AI agent establishing a hidden, unmonitored internal deployment; agents subtly altering research results or persuading users toward incorrect conclusions; and exfiltrating critical model weights or destroying infrastructure.
To combat such risks, DeepMind is advocating dynamic, real-time access controls over traditional corporate permissions. John “Four” Flynn, DeepMind’s vice president of security and privacy, said standard role-based access fails with AI because automated workflows blur departmental lines. A single AI agent might perform tasks associated with multiple corporate roles within minutes, he said.
Instead, DeepMind is building sophisticated monitoring systems designed to grant or revoke an agent’s access on a task-by-task basis, instantly cutting off permissions if AI deviates from normal behavior.
Some of these defenses are already operational.
DeepMind revealed it has audited roughly one million coding agent tasks to build a live monitor for its Gemini Spark agent, successfully catching unintentional data deletions.
While the company reports that the vast majority of current flags stem from agent misunderstanding or overeager goal pursuit rather than genuine malice, the infrastructure is being laid for far more capable future models.
However, the strategy has drawn skepticism from independent experts, particularly regarding DeepMind’s plan to use secondary AI models as supervisors to review an agent’s reasoning.
“Multi-agent systems where AI monitors AI can be problematic,” warned Dawn Song, a computer science professor at the University of California at Berkeley. “If the monitor model won’t flag failures because it’s protecting its peer, the entire oversight architecture breaks.”
Dubbed v0.1, DeepMind views this roadmap as an evolving work in progress. As AI companies race to deploy increasingly autonomous systems, the industry is now watching to see if these digital containment fences can be built fast enough to keep pace with the models they are meant to restrain.

