AI and ML in Security Cybersecurity Cybersecurity Featured Security Awareness Security Boulevard (Original) Social - Facebook Social - LinkedIn Social - X Spotlight Zero-Trust

Home » Cybersecurity » Google DeepMind Treats Advanced AI as ‘Insider Threats’ in New Cybersecurity Roadmap

Google DeepMind Treats Advanced AI as ‘Insider Threats’ in New Cybersecurity Roadmap

by Jon Swartz on June 20, 2026

Google DeepMind has introduced a security framework that treats artificial intelligence (AI) agents not just as software tools but as potential security threats.

The company’s newly published “AI Control Roadmap” outlines a proactive strategy to contain, monitor, and actively police AI systems that could bypass human oversight, exfiltrate data, or quietly sabotage their assigned tasks.

The 35-page technical report introduces a framework called TRAIT&R (Taxonomy of Rogue AI Tactics and Routines), modeled closely after the cybersecurity industry’s standard MITRE ATT&CK database. The move illustrates a departure from the AI safety community’s traditional, exclusive focus on the “alignment problem,” the effort to permanently program human ethics and intentions into AI.

DeepMind officials acknowledge that while alignment remains a priority, it may never be fully solved, necessitating a zero-trust, layered defense.

“The first line of defense is always to align the AI systems, but it’s always good to have multiple layers,” said Rohin Shah, lead of the AGI safety and alignment team at DeepMind. “We borrow a lot from security, which already deals with the threat of internal employees who might be malicious, and we can apply these to a new setting.”

The TRAIT&R taxonomy identifies three threat categories to defend against: an AI agent establishing a hidden, unmonitored internal deployment; agents subtly altering research results or persuading users toward incorrect conclusions; and exfiltrating critical model weights or destroying infrastructure.

To combat such risks, DeepMind is advocating dynamic, real-time access controls over traditional corporate permissions. John “Four” Flynn, DeepMind’s vice president of security and privacy, said standard role-based access fails with AI because automated workflows blur departmental lines. A single AI agent might perform tasks associated with multiple corporate roles within minutes, he said.

Instead, DeepMind is building sophisticated monitoring systems designed to grant or revoke an agent’s access on a task-by-task basis, instantly cutting off permissions if AI deviates from normal behavior.

Some of these defenses are already operational.

DeepMind revealed it has audited roughly one million coding agent tasks to build a live monitor for its Gemini Spark agent, successfully catching unintentional data deletions.

While the company reports that the vast majority of current flags stem from agent misunderstanding or overeager goal pursuit rather than genuine malice, the infrastructure is being laid for far more capable future models.

However, the strategy has drawn skepticism from independent experts, particularly regarding DeepMind’s plan to use secondary AI models as supervisors to review an agent’s reasoning.

“Multi-agent systems where AI monitors AI can be problematic,” warned Dawn Song, a computer science professor at the University of California at Berkeley. “If the monitor model won’t flag failures because it’s protecting its peer, the entire oversight architecture breaks.”

Dubbed v0.1, DeepMind views this roadmap as an evolving work in progress. As AI companies race to deploy increasingly autonomous systems, the industry is now watching to see if these digital containment fences can be built fast enough to keep pace with the models they are meant to restrain.

Jon Swartz

Jon Swartz is senior content writer at Techstrong Group. Most recently, he was MarketWatch’s senior reporter based in San Francisco covering technology and Silicon Valley. Previously, Swartz was USA Today’s San Francisco bureau chief. He has also written for Forbes, The (London) Independent, London Times, San Francisco Chronicle, and New Orleans Times-Picayune. He has won numerous journalism awards and is a two-time finalist for the Loebs, the Pulitzers of business reporting. Additionally, he frequently appears as a panelist on Fox Business and NBC Bay Area’s Press:Here program. He has been nominated four times for the Pulitzer Prize. Swartz is co-author of “Zero Day Threat: The Shocking Truth of How Banks and Credit Bureaus Help Cyber Crooks Steal Your Money and Identity” and sole author of “Young Wealth.”

jon-swartz has 50 posts and counting.See all posts by jon-swartz