Anthropic’s New AI Permission System with Fable 5
Anthropic did something this week that felt a little bit like science fiction. It released its most powerful model yet, Claude Fable 5, and then announced another model called Mythos 5. When the news first broke, many people assumed these were two different models and immediately began comparing their capabilities.
But they aren’t. They share the same underlying weights, the same core capabilities, and the same price. The real difference isn’t the model itself. The difference lies in how much of the model’s capability you’re allowed to access, and which version of that capability you’re allowed to use. In other words, the same model now has two identities.
As someone who has spent years working in security governance and risk management, the more I think about this design, the more interesting it becomes. Because in my view, the most important part of this release isn’t the model. It’s the governance philosophy behind it.
Why Split One Model Into Two Identities?
First, some context. Models in the Mythos class possess a capability that is both exciting and concerning: they are exceptionally good at offensive cybersecurity tasks. This is not an exaggeration.
These models can identify software vulnerabilities, analyze system weaknesses, move laterally through networks, and connect reconnaissance, exploitation, and privilege escalation into coherent attack chains. Anthropic describes this capability as “agentic hacking”, the ability for a model to autonomously perform large portions of the offensive security lifecycle. This kind of capability is the definition of a dual-use technology.
For security researchers and defenders, it could be an extraordinary productivity tool. For attackers, it could be an extraordinary weapon. That creates a difficult question. If a model naturally possesses the skills of a highly capable security researcher, or even an attacker, how do you safely make it available to the world?
Over the past year, much of the AI safety conversation has been built around an implicit assumption: if a capability is dangerous enough, it should either not be released or should be heavily restricted.
Anthropic appears to be approaching the problem differently. Instead of asking how to lock capabilities away, it is asking a more interesting question: If these capabilities are going to exist anyway, can we allow different people, in different contexts, to access different levels of capability? At its core, that is no longer just an AI problem. It is a governance problem.
Anthropic’s Answer: Don’t Block the User. Switch the Identity.
Traditional AI safety mechanisms are straightforward. A user asks a dangerous question. The model refuses and you hit a wall. Fable 5 takes a completely different approach. When a user asks about high-risk topics such as cybersecurity, biology, or chemistry, a set of classifiers first evaluates the request. You can think of these classifiers as AI security guards standing in front of the model.
If the request exceeds a certain risk threshold, the system doesn’t simply refuse to answer. Instead, it quietly routes the request to an older, less capable model, Claude Opus 4.8, and informs the user that a downgraded model is being used to respond. In other words, it doesn’t say no. It simply switches engines.
What makes this design particularly interesting is how rarely the downgrade occurs. According to Anthropic, more than 95% of conversations are handled directly by the full-capability frontier model. Most users, most of the time, interact with the strongest version available. Many may never even realize the wall exists.
How Hard Is It to Break?
This is where the story becomes especially interesting. Before release, Anthropic launched an external red-teaming program and invited researchers and security experts to attack the system.
According to public information, the testing effort accumulated more than 1,000 hours of adversarial evaluation, yet no universally reliable jailbreak technique was discovered.
One external evaluation partner reportedly used around thirty publicly known jailbreak methods to persuade Fable 5 to assist with cyberattack planning, exploit development, and other high-risk activities.
The result was simple: Zero successful bypasses. Not a single method consistently worked. The evaluator even described it as one of the most resistant frontier models they had tested when it came to cybersecurity misuse.
But the Story Doesn’t End There
Just as Anthropic might have been ready to celebrate, the story took an interesting turn. The UK AI Safety Institute reported that, within a relatively short evaluation period, researchers had already made early progress toward bypassing portions of the safety system.
More importantly, this wasn’t exposed by journalists or leaked by external researchers. Anthropic disclosed it voluntarily. As someone who has spent years working in security governance, I appreciate that detail.
Because it acknowledges a truth that every security practitioner already understands: One thousand hours without a compromise does not mean a system is unbreakable. There is an old saying in security: given enough time, enough resources, and enough motivation, attackers will eventually find something. Anthropic appears to understand this reality as well.
The goal is not to build a wall that can never be breached. The goal is to make attacks sufficiently difficult, sufficiently expensive, and sufficiently slow that defenders can detect and stop malicious activity before meaningful harm occurs.
In reality, that is how most security controls work. We never assume a control is perfect. We assume it will be attacked, and we design it so that attacking it becomes expensive enough to discourage abuse.
Why This Matters
If we step back from the technical details, what makes Fable 5 interesting is that it changes the way we think about AI safety.
For the past few years, AI safety has largely operated on a binary model. Allow or deny. Answer or refuse. At its core, it has been a wall.
The problem with walls is that every refusal sends a signal. To an attacker, a refusal often means there is something valuable on the other side worth continuing to probe. Fable 5 introduces a different idea. Instead of blocking capability, it manages capability.
Biology researchers can still receive assistance. Chemistry students can still receive answers. Ordinary users can still access the vast majority of the model’s capabilities. What gets restricted is only a narrow slice of frontier capability that could realistically transform knowledge into real-world harm.
Anyone who has worked in access management will find this logic familiar. Organizations do not give administrator privileges to every employee. Mature security programs do not operate through blanket permission or blanket denial. They assign different levels of access based on role, risk, and sensitivity. An intern and a database administrator may interact with the same system, but they are not allowed to do the same things.
What Fable 5 is doing feels remarkably similar. It is applying the logic of access control to AI capabilities themselves. Most of the time, users receive the full-strength model. Only in a small number of high-risk scenarios does the system automatically switch to a version whose capabilities have been intentionally constrained. That is why I find this release so interesting.
For years, AI safety discussions have focused on how to control models. Fable 5 suggests a different possibility: perhaps the thing that ultimately needs governance is not the model, but the capability itself. As AI systems become increasingly powerful, the central question may no longer be whether a model can do something, but who should be allowed to do it, under what circumstances, and to what extent. Fable 5 looks like a model release. But it may actually be the first permission management system for AI.
.
The post Anthropic's New AI Permission System with Fable 5 appeared first on Chasing Polaris – Wickey's blog.
*** This is a Security Bloggers Network syndicated blog from Chasing Polaris - Wickey's blog authored by Wickey Wang. Read the original post at: https://wickey.substack.com/p/anthropics-new-ai-permission-system


