Policy Notable

Anthropic Withholds New AI Model, Citing Safety Risks

April 9, 2026 2 min read

Image: Anthropic

Anthropic has decided not to release a newly developed AI model, citing safety concerns serious enough to keep it out of public hands. The announcement is one of the clearest examples yet of a frontier AI lab actually acting on its stated safety commitments rather than just publishing policy documents.

Anthropics Responsible Scaling Policy (RSP) - a framework the company adopted in 2023 - sets specific capability thresholds that trigger restricted or no release. The policy defines escalating safety levels, from ASL-2 (current deployed models) upward, with ASL-4 representing capabilities that could pose catastrophic risks. If a model crosses one of those thresholds and the company cannot demonstrate adequate safety mitigations, the RSP requires them to halt deployment. This appears to be that policy in action.

The practical question for anyone building with Anthropic's tools is what this means for the model roadmap. Anthropic offers Claude 3.5 and Claude 3.7 through its API and consumer products. A withheld model doesn't disappear from the lab's research - it typically means the capability exists but deployment is paused pending safety work, or indefinitely if mitigations aren't achievable.

What "Dangerous" Actually Means Here

When AI safety researchers talk about dangerous capabilities, they're usually referring to a specific set of risks: whether a model could meaningfully help someone with no prior knowledge synthesize a biological or chemical weapon, whether it can autonomously conduct cyberattacks, or whether it demonstrates a capacity for deception sophisticated enough to undermine human oversight. These aren't theoretical concerns - they're the actual criteria Anthropic uses in its capability evaluations.

The distinction matters because "dangerous" in AI safety discourse is not the same as "generates offensive content" or "can be jailbroken." Those are real problems but they're table stakes. A model withheld at this level is flagged for something closer to weapons-enabling potential or catastrophic misuse.

Anthropics decision will invite scrutiny - and some skepticism. Critics have argued that safety-oriented labs still race to ship, and that RSP commitments are harder to hold when competitive pressure builds. Sitting on a finished model is a meaningful test of whether that criticism holds. This time, at least, it appears they held the line.

What "Dangerous" Actually Means Here

Related Tools

More from today

Florida Attorney General Opens Investigation Into OpenAI Over National Security Claims

Ohio Man First Convicted Under AI Image Statute in U.S. Legal Precedent

Mozilla: Microsoft Is Running the Browser Playbook Again, This Time With Copilot

Cookie Preferences