Research

AI Code Doesn't Need to Be Perfect - Just Better Than What You'd Write

April 9, 2026 2 min read

What happens when you stop asking "is this AI code correct?" and start asking "is this AI code better than what I'd write?" That's the core of the Waymo Rule, a framework for evaluating AI-generated code that's worth taking seriously.

The analogy is simple: Waymo's self-driving cars aren't crash-free. They just crash less than humans do. That's enough to make them trustworthy and commercially viable. Developer and writer Randy Ng applies the same logic to AI code generation. The bar isn't perfection - it's whether the AI output is more reliable than the human alternative.

This sounds obvious, but it cuts against how a lot of developers actually approach AI-generated code in practice. The instinct is to treat any AI mistake as evidence that AI can't be trusted, then fall back to writing everything by hand. The Waymo Rule says that's the wrong comparison. The question isn't "did the AI get this wrong?" It's "would I have gotten this wrong too, or gotten something worse?"

What This Looks Like in Practice

Ng isn't arguing for blindly accepting AI output. He still advocates for concrete quality checks: static analysis tools, complexity limits measured via tools like Lizard, and property-based testing (a technique where you define rules about what the code should always do, then automatically generate hundreds of test cases to try to break those rules). These aren't new ideas, but the Waymo Rule reframes why you run them. You're not checking whether the AI met an absolute standard. You're checking whether it beat the baseline.

Ng's earlier work focused on formal verification methods - mathematically proving that code behaves correctly. He's since dropped that approach. His reasoning: AI models have improved to the point where trust-based evaluation is more practical than proof-based verification. He specifically mentions trusting GPT-5.4 to get things right most of the time.

The Shift Worth Paying Attention To

The broader implication here is about how practitioners calibrate their confidence in AI tools. Most discussions about AI code quality focus on catching errors. This one focuses on setting the right reference point for what counts as an error worth worrying about.

For developers already using AI coding tools daily, this framing probably matches their intuition. You're not asking "did Cursor write flawless code?" You're asking "is this better than what I'd have spent two hours writing?" Most of the time, it is. The Waymo Rule just makes that calculus explicit.

The weakness in the argument is that "better than human" is a moving target that's hard to measure without actually shipping both versions and comparing outcomes. Static analysis catches some categories of error, but not the ones that emerge from misunderstood requirements or subtle logic bugs. Still, as a mental model for calibrating trust rather than demanding perfection, it's a useful frame for any developer trying to figure out how much to lean on AI-generated code in production.

What This Looks Like in Practice

The Shift Worth Paying Attention To

Related Tools

More from today

AI Writes Code Faster Than Developers Can Check It. That's Now the Real Problem.

China's AI Micro-Drama Boom Shows What AI-Powered Content Creation Actually Looks Like

Gen Z Is Cooling on AI Tools, Gallup Polling Shows

Cookie Preferences