SBN

Why the Security Controls Built Into LLMs Aren’t Enough

LLM vendors are increasingly building security features and guardrails into their models. However, the controls inside the model are designed for a contained, request-response world. A user sends a prompt, and the model returns a response. LLM security focuses on making that response safe.

Agentic AI shows us how insufficient those model-based controls are. Today, agents call tools, chain with other agents, read from external data sources, and take actions that can have real-world consequences. A model that refuses to write malware can still exfiltrate data through a compromised tool call. A model with perfect alignment can still get hijacked through an indirect prompt injection buried in a document it retrieved. The gap between what’s inside the model and what production security demands is wide.

What’s Inside the Model

The security properties that belong to the model itself fall into four categories.

Alignment training.

Model providers use techniques like reinforcement learning from human feedback (RLHF) and constitutional AI (having the AI critique itself) to shape model behavior at a fundamental level. Alignment training makes the model resistant to generating clearly harmful content and inclined to follow operator intent.

System prompt adherence.

Models give privileged weight to operator instructions in the system prompt, creating a rudimentary two-tier access model: operators set policy, users interact within it. This isn’t a true access control system, but it does give deployers meaningful leverage over model behavior.

Built-in content refusals.

Models carry trained refusal behaviors for a defined set of harmful output categories — weapons synthesis, illegal activity, and similar categories where the harm threshold is unambiguous.

Output self-censorship.

Beyond explicit refusals, models probabilistically suppress certain completions based on training, even without a specific filter triggering. This is a soft control, not a hard one, but it adds friction to attempts to extract harmful content.

These four categories are important, but they also have a hard ceiling. Alignment is probabilistic, not deterministic. System prompt adherence can be overridden by sufficiently adversarial inputs. Content refusals cover defined categories, not novel attacks, and self-censorship fails in ways that are difficult to predict.

Where the Controls Break Down

Prompt injection has no reliable fix.

Direct injection — where a user tries to override system instructions — is manageable with filters. Indirect injection is not. When a model retrieves a document, reads a web page, or processes a tool response containing embedded instructions, those instructions arrive as context, not user input. Filters that only inspect user-submitted text don’t catch payloads riding in on retrieved content.

Jailbreaks find the edges of alignment.

Adversarial prompts that reframe requests, use fictional contexts, or chain reasoning steps reliably expose the probabilistic nature of alignment training. No model is immune.

Agent-to-agent communication has no trust model.

When an orchestrator delegates to a sub-agent through an MCP server, there’s no cryptographic attestation and no chain of custody. A compromised orchestrator can instruct sub-agents to take actions the original task never authorized.

Models produce no behavioral telemetry.

Security teams can see inputs and outputs. They can’t see what happened in between — which tools were considered, what context shaped the decision, or why a particular action was taken. That opacity makes incident response forensic rather than preventive.

RAG pipelines treat unverified data as ground truth.

Poisoned embeddings, cross-tenant data leakage, and retrieval manipulation let adversaries influence model outputs without ever touching the model or the user-facing interface.

What Security Controls Are Required?

Real AI security requires controls the models were never designed to provide:

  • Policy enforcement at the traffic layer, applied before prompts reach the model and before responses reach users
  • Non-human identity infrastructure — workload identity and dynamic credentials tied to runtime context, not static keys embedded in environment variables
  • Behavioral monitoring that analyzes every agent action in context
  • Least-privilege tool access enforced externally, so that what tools an agent can call are limited to its job description
  • Unified visibility across all LLM traffic — inputs, outputs, and tool calls

Cequence Provides the Controls LLMs Lack

The Cequence AI Gateway sits in the request path between agents and the applications and data they interact with. It applies security controls independent of which LLM is in use, which provider hosts it, or what native controls that provider offers.

But the real differentiator is behavioral analysis, and it’s where Cequence’s history in network-based bot management and API security translates into a capability that other AI security tools can’t replicate. Cequence doesn’t analyze individual requests in isolation. It builds behavioral profiles across sessions, users, and agents over time. A single anomalous prompt looks like noise. The same prompt as step seven in a twelve-step jailbreak sequence looks like an attack, but only if you’re tracking the entire journey.

That behavioral intelligence catches what other products miss:

  • Slow-burn prompt injection sequences spread across multiple conversational turns, where no individual turn crosses a threshold but the cumulative pattern reveals an attack
  • Low-and-slow data extraction attempts, where an attacker systematically probes model behavior slowly enough to stay under rate limits
  • Anomalous agent behavior, where an agent suddenly calls tools outside its established pattern, token consumption spikes without a corresponding workload change, or a session deviates from its behavioral baseline in ways that signal compromise
  • Human vs. automated traffic distinction, enabling differentiated policy enforcement for human users versus agents and bots — a capability that matters as AI interfaces become primary attack surfaces

This behavioral foundation isn’t new. Cequence built it over years of defending enterprise applications, APIs, and data, and stopping bot attacks at scale. AI Gateway applies that same intelligence to a new and rapidly expanding traffic type.

AI Gateway also provides unified visibility across all AI traffic for an audit trail and model-agnostic policy enforcement that travels with the traffic. Organizations can switch providers, add new models, or expand to new use cases without resetting their security posture.

Want to see it in action? Let us show you in a personalized demo.

The post Why the Security Controls Built Into LLMs Aren’t Enough appeared first on Cequence Security.

*** This is a Security Bloggers Network syndicated blog from Cequence Security authored by Jeff Harrell. Read the original post at: https://www.cequence.ai/blog/ai/why-llm-security-controls-arent-enough/