Prompt Control is the New Front Door of Application Security
Application security has always been built around a simple assumption: There is a front door. Traffic enters through known interfaces, authentication establishes identity, authorization constrains behavior, and controls downstream enforcement of policy. That model still exists, but our most recent research shows it no longer captures where risk actually concentrates in AI-driven systems.
When we asked where security and delivery services have the greatest operational impact in AI architectures, the answer was clear. The inference layers dominate. Prompt, token, and output layers account for the majority of reported security and delivery headaches, outweighing concerns at the integration edge or even model routing.
This is not an abstract shift. In our research, the prompt layer alone was cited by 25% of respondents as the most impactful location for application security, and by 29% for application delivery. Token layers followed closely at 23% for both security and delivery. Output layers, while still significant, ranked lower at 19% and 14%, respectively.
That distribution matters because the inference layers is not an infrastructure boundary, it’s a behavioral one.
Why the Front Door Moved Upstream
Prompts are where intent enters the system. They define not only what a user is asking, but how the model should reason, what context it should retain, and which safeguards it should attempt to bypass. That is why prompt layers now outrank traditional integration points as the most impactful area for both application security and delivery.
Injection attacks, context poisoning, and memory manipulation all occur before a model generates a single token. If controls are not applied here, downstream protections are already operating in recovery mode. Our research reflects that reality. Respondents are feeling the pressure where behavior is shaped, not where outputs are filtered.
Token Control has Become a Security Primitive
The prominence of token layers in the data reinforces another shift. Token management is now also about containment in addition to efficiency.
Token limits, shaping, and streaming controls define how much damage a request, session, or user can inflict before controls intervene. Without them, attackers don’t need to exploit vulnerabilities. They can simply consume resources until cost, latency, or availability becomes the failure mode. It’s a denial-of-service attack surface that requires attention because the cost of failing to prevent it is now measured in both downtime and (many) dollars.
That is why nearly a quarter of respondents identified token layers as the most impactful location for securing and delivering AI inference. Tokens represent cost, capacity, and abuse surface simultaneously.
Output Controls are Necessary, but They are Not the Front Door
Output moderation still matters, and our research shows it remains a meaningful concern. But its lower ranking is telling. Output controls catch problems after the system has already behaved badly. They are essential guardrails, not primary defenses. It’s always more efficient to stop the thief on the way in rather than try to catch him after the fact, and in the case of inference, it’s less costly because stopping on the ingress means no token processing costs incurred.
Organizations relying predominantly on output filtering are implicitly accepting that policy violations, hallucinations, and data leakage will occur, and that the goal is to detect them quickly rather than prevent them entirely. “Assume breach” is a first principle of zero trust so employing output filters is not only reasonable, but a necessary precaution.
Authentication and Observability Remain Foundational
Our second set of findings reinforces this point. Authentication and observability lead the methods organizations use to secure and deliver AI inference services, cited by 55% and 54% of respondents, respectively. This holds true across roles, with the exception of developers, who more often prioritize protection against sensitive data leaks.
This is not a contradiction. It is a reminder that inference security still depends on knowing who is making requests and understanding how the system is behaving. What has changed is what must be authenticated and observed. It is no longer sufficient to focus on APIs and services alone. Prompts, sessions, token flows, and routing decisions now demand the same level of scrutiny.
The Security Takeaway
Our most recent research does not suggest abandoning traditional application security controls. It shows where their center of gravity must shift.
Inference layers are where intent is expressed, costs accumulate, and behavior diverges. That makes prompt control the functional equivalent of the application front door in AI-driven systems.
Security teams that continue to concentrate controls exclusively at the network edge or model back end will find themselves enforcing policy too late in the process. The organizations experiencing the least friction are the ones treating prompt handling, token governance, and inference routing as first-class security domains.
The time to develop a strategy to secure inference is now, because when AI agents arrive and begin to multiply themselves, it may be too late to regain control.

