AI and ML in Security Analytics & Intelligence Cybersecurity Security Awareness Security Boulevard (Original) Social - Facebook Social - LinkedIn Social - X Spotlight Threat Intelligence

Home » Cybersecurity » And Now, LLMs Don’t Need Human Intervention to Plan and Execute Large, Complex Attacks

And Now, LLMs Don’t Need Human Intervention to Plan and Execute Large, Complex Attacks

by Teri Robinson on August 12, 2025

Uh oh. Large language models (LLMs) apparently have the capability to autonomously plan and execute very complex attacks against networks.

So say researchers at Carnegie Mellon University who found that “LLMs — when equipped with structured abstractions and integrated into a hierarchical system of agents — can function not merely as passive tools, but as active, autonomous red team agents capable of coordinating and executing multi-step cyberattacks without detailed human instruction.”

And that’s not so good news for cybersecurity. “This raises the stakes for defenders,” says Margaret Cunningham, director of security & AI strategy and field CTO at Darktrace. “Traditional detection methods won’t scale against systems that plan and adapt.”

So much of the earlier research on LLMs explored how they performed “in simplified capture-the-flag (CTF) environments. But the Carnegie Mellon team took it to the next level, “evaluating LLMs in realistic enterprise network environments and considering sophisticated, multi-stage attack plans.”

The Carnegie Mellon researchers intended “to understand whether an LLM could perform the high-level planning required for real-world network exploitation, and we were surprised by how well it worked,” the report quoted PhD candidate Brian Singer, who led the project, as saying. “We found that by providing the model with an abstracted ‘mental model’ of network red teaming behavior and available actions, LLMs could effectively plan and initiate autonomous attacks through coordinated execution by sub-agents.”

To get there, the researchers used LLMs capable of reasoning and knowledgeable of security tools. Those LLMs came up empty during the challenges. But once Singer and team taught the LLMs (and some smaller ones) “a mental model and abstraction of security attack orchestration, they showed dramatic improvement.”

In large part, that’s because researchers removed a limiting factor — the requirement that LLMs execute raw shell commands. Instead, their system imbues the LLMs “with higher-level decision-making capabilities while delegating low-level tasks to a combination of LLM and non-LLM agents,” the study said.

“What’s important to understand is that the researchers didn’t train the LLM to be ‘smarter,’” says Cunningham. Instead, they equipped it with “better tools, clearer instructions, and a structured environment, which enabled it to perform the task incredibly effectively.” She compared it to “giving a kid a well-designed science kit instead of a pile of wires,” contending that the research provides “a clear example of how humans are getting even more sophisticated at ‘engineering’ the LLMs to accomplish complex tasks autonomously.”

To test their models, the team turned to the 2017 Equifax data breach, recreating the network environment in which that breach happened, complete with the same vulnerabilities and topology that characterized Equifax’s network. The LLM replicates the breach without the benefit of human intervention.

“I think the Equifax simulation is a clear signal. When given the right conditions by humans, the system autonomously scanned, exploited, moved laterally, escalated privileges, and exfiltrated data across 48 databases in a realistic enterprise environment,” says Cunningham. “This was a full campaign.” 

She does not doubt that, given their existing modular tooling and operational workflows that align with the architecture, “nation-state actors are well-positioned to adopt this.” Cunningham expects integration in the next 6-12 months, with cybercriminals following “especially as open-source tooling lowers the barrier to entry.”

While Jeremy London, director of engineering, AI and threat analytics at Keeper Security, points out the same technology lets organizations “run continuous, cost-effective attack simulations – something that was once expensive and rare – making advanced security testing accessible to companies of all sizes,” he notes that “traditional reactive defenses are no longer enough.”

London advocates for security strategies that are adaptive and continuous and “grounded in principles like zero-trust and least privilege to reduce risk and enable faster response.”

He urges organizations to get on the ball and move quickly to get ahead of attackers. “Teams that embrace AI-powered security early will have a real advantage.” That means ramping up the skills needed to work with new AI tools “to keep pace with evolving threats.”

These advances LLM capabilities will likely compromise supply chain security, already a tough nut to crack, even more. “A single vendor without modern, AI-enhanced defenses can quickly become the weak link that attackers exploit.”

Cunningham is calling for a shift toward behavioral analytics, “specifically models that infer intent from sequences of actions, not just signatures or anomalies,” to reason “what an attacker is trying to achieve, not just what they’re doing in the moment.”

To overcome the risks, industry, government and academia must collaborate — with fractures in alliances becoming more prominent, it is becoming more difficult to have faith in future collaboration. “The real question now,” London says, “is how fast organizations will adopt these tools to build smarter, stronger defenses for the future.”

Teri Robinson

From the time she was 10 years old and her father gave her an electric typewriter for Christmas, Teri Robinson knew she wanted to be a writer. What she didn’t know is how the path from graduate school at LSU, where she earned a Masters degree in Journalism, would lead her on a decades-long journey from her native Louisiana to Washington, D.C. and eventually to New York City where she established a thriving practice as a writer, editor, content specialist and consultant, covering cybersecurity, business and technology, finance, regulatory, policy and customer service, among other topics; contributed to a book on the first year of motherhood; penned award-winning screenplays; and filmed a series of short movies. Most recently, as the executive editor of SC Media, Teri helped transform a 30-year-old, well-respected brand into a digital powerhouse that delivers thought leadership, high-impact journalism and the most relevant, actionable information to an audience of cybersecurity professionals, policymakers and practitioners.

teri-robinson has 300 posts and counting.See all posts by teri-robinson

Teri Robinson

Senator Sanders Wants to Own AI Companies — and Hand America’s Adversaries the Keys

NIST’s Nine: The PQC Signature Race Moves to Round Three

The Quantum Arms Race: Why Washington Just Wrote a $2 Billion Check to Nine Companies

Beyond Moore’s Law: The Hyper-Acceleration of Autonomous AI Cyber Capabilities

The Exception Economy: When Security Teams Stop Protecting and Start Negotiating

GoPlus’s Latest Report Highlights How Blockchain Communities Are Leveraging Critical API Security Data To Mitigate Web3 Threats

C2A Security’s EVSec Risk Management and Automation Platform Gains Traction in Automotive Industry as Companies Seek to Efficiently Meet Regulatory Requirements

Zama Raises $73M in Series A Lead by Multicoin Capital and Protocol Labs to Commercialize Fully Homomorphic Encryption

RSM US Deploys Stellar Cyber Open XDR Platform to Secure Clients

ThreatHunter.ai Halts Hundreds of Attacks in the past 48 hours: Combating Ransomware and Nation-State Cyber Threats Head-On

Randall Munroe’s XKCD ‘Border Message’