Research Notable

Anthropic's Restricted Security Model Gets a Real-World Test at Cloudflare

May 18, 2026 2 min read

What happens when an AI vulnerability scanner is effective enough to scare its own creator?

Last month, Anthropic announced Project Glasswing alongside a model called Mythos Preview - a security-focused AI that autonomously found thousands of high-severity vulnerabilities across every major operating system and web browser. The results were alarming enough that Anthropic decided not to release the model publicly. Instead, roughly 40 organizations got restricted access to run it defensively on their own code.

Cloudflare was one of those organizations. They've now published a detailed breakdown of what happened when they ran Mythos Preview against more than 50 of their internal repositories - one of the first honest, production-scale accounts of how the model actually performs.

The Logic Behind the Restricted Rollout

Anthropics's decision to keep Mythos Preview out of public hands is easy to understand once you see what the model does. A tool that can autonomously surface thousands of previously unknown critical vulnerabilities in widely-deployed software - Chrome, Windows, macOS, Linux - is also a tool that attackers would use to find those same vulnerabilities in other people's infrastructure.

The 40-org controlled program threads a real needle: gather performance data across diverse production environments without releasing a general-purpose attack capability. Cloudflare running the model against 50+ codebases gives Anthropic signal on how Mythos Preview handles a complex, heterogeneous codebase rather than a curated benchmark.

What Makes This Disclosure Significant

Most AI security product case studies are sanitized. Cloudflare's published breakdown is notable because it comes from an organization running infrastructure at global scale with no reason to flatter the results.

The capability Anthropic highlighted in their own documentation is the model's ability to reason through multi-step attack chains - not just flagging that a function has a bug, but tracing how that bug could be combined with other issues to cause real damage. That's a meaningfully different class of analysis from conventional static analysis tools, which identify suspicious patterns without understanding how they connect.

For security teams, the practical question is always false positive rate. Scanners that generate thousands of alerts create noise, and engineers learn to ignore them. Whether Cloudflare's breakdown addresses this directly will determine how useful it is as a reference for anyone considering AI-assisted vulnerability discovery.

The broader point is this: autonomous, context-aware security analysis across an entire codebase is no longer a research prototype. Cloudflare's report is the kind of documented real-world feedback that will shape how these tools develop from here.

The Logic Behind the Restricted Rollout

What Makes This Disclosure Significant

Related Tools

More from today

IBM Research Launches Open Agent Leaderboard for Standardized AI Agent Benchmarking

Voice AI Can Be Hijacked by Audio Commands Hidden in Ordinary Sound

Cloudflare's Project Glasswing: Lessons from Running an AI Agent Called Mythos

Cookie Preferences