Four generations of custom AI silicon in two years. That's the pace Meta has set with its MTIA (Meta Training and Inference Accelerator) program, which the company detailed in a recent blog post. Most chip programs take three to five years between major generations. Meta is running at roughly double that speed.
The problem driving this investment is real: serving AI across Facebook, Instagram, and WhatsApp means running inference - the process of using a trained AI model to generate a result, like a feed ranking or a content recommendation - billions of times per day. At that volume, every fraction of a cent saved per query becomes meaningful. Meta describes it as one of the most demanding infrastructure challenges in computing, which isn't marketing language when you're handling that volume of traffic.
This is the same logic that pushed Google toward TPUs and Amazon toward its Inferentia and Trainium chips. Build hardware optimized for your specific workloads instead of paying for the general-purpose capabilities of NVIDIA GPUs that you don't need. The difference with Meta is the pace - four chips in two years signals they've moved past experimentation and into genuine dependence on their own silicon.
What This Means Beyond Meta's Walls
Meta's chip program is one concrete reason Llama models can be released as free, open-weight models while still being economically sustainable. When your inference costs are substantially lower than competitors who pay full cloud GPU rates, you have room to give things away that others can't afford to.
For the AI tools industry more broadly, this reinforces an infrastructure gap between the large platform companies and everyone else. OpenAI and Anthropic still price their APIs around cloud GPU costs. Meta is systematically reducing its exposure to that cost structure. Four chips in two years suggests the gap will keep widening.