A 2020 MacBook Air running Google's Gemma 4 model locally. That sentence would have been absurd two years ago.
Gemma 4 is Google's latest open-weight model, meaning anyone can download and run it on their own hardware instead of paying for API calls. The 2020 MacBook Air shipped with either an Intel chip or Apple's first M1 processor and maxed out at 16GB of RAM. Neither configuration was designed for running AI models that typically demand high-end GPUs and 32GB or more of memory. Yet here we are: quantized versions of Gemma 4 (compressed versions that trade a small amount of accuracy for dramatically lower hardware requirements) can squeeze onto machines that most people consider due for replacement.
The practical experience is rough. Expect generation speeds measured in seconds per word rather than words per second. You are not going to use this setup for real work. But the fact that it runs at all shows how quickly the optimization side of local AI has progressed. Tools like llama.cpp and Ollama have made it possible to run surprisingly capable models on consumer hardware by using aggressive quantization, typically shrinking models from their original size by 4-8x.
For anyone sitting on older hardware and curious about local AI, the takeaway is simple: you can experiment without buying new equipment. Just do not expect the snappy response times you get from ChatGPT or Claude. The real story here is not one MacBook Air struggling through inference. It is that open-weight models are getting efficient enough that the minimum hardware bar keeps dropping. A year from now, running a capable model on a five-year-old laptop might actually be usable.