Related ToolsChatgptClaudeClaude For DesktopClaude Mobile

The Models Are Good. The Apps Built on Them Frequently Are Not.

AI news: The Models Are Good. The Apps Built on Them Frequently Are Not.

Last year, the bottleneck was the model. Now, for a growing number of daily AI users, the bottleneck is the app sitting on top of it.

Claude Opus handles complex reasoning at a level that routinely surprises people who use it through API access. The Claude.ai web interface - two years after launch - still has a limited integration list, and some of the integrations that exist reportedly don't function reliably. ChatGPT, despite its dominant market position, still lacks Gmail integration, one of the most obvious productivity use cases anyone could name. Grok handles social media search reasonably well and struggles to go much further.

The models have outpaced the products.

Where Investment Goes

AI labs are structured to push model performance. Research on model capability is tractable: there are benchmarks, clear metrics, and a competitive landscape that rewards wins. Building reliable third-party integrations is different work entirely - slow, dependent on external API stability, requiring ongoing maintenance every time a third party changes something. It generates no benchmark scores.

The result is that practitioners doing serious work have started routing around official interfaces. They use API access directly, build lightweight wrappers, or wire tools together through automation platforms. The web app becomes a demo layer - good enough to show someone what the model can do, not good enough to run a workflow on.

The Token Overhead Problem

There's a separate complaint folded into this: the official chat interfaces consume more tokens than necessary for basic tasks. A token is roughly one word or word-piece - the unit large language models use to process and generate text. When an interface adds heavy system instructions, redundant context, or other overhead before reaching your actual request, you're using up tokens without proportional benefit. On capped plans, that overhead hits limits faster. On consumption-based plans, it raises costs.

The counterargument - that consumer AI products are genuinely hard to build and the integrations that do work represent real engineering effort - is fair. But two years in, with the revenue these companies are generating, the bar should be higher than integrations that go missing or fail silently.