Open Source

Gemma Runs Real-Time Audio and Video AI on a MacBook M3 Pro Locally

April 5, 2026 2 min read

A developer has demonstrated real-time multimodal AI running entirely on an Apple M3 Pro laptop - taking live audio and video as input and producing spoken voice responses, all without sending any data to the cloud.

The demo uses Gemma E2B, a small open-source model from Google, running locally via a custom pipeline that processes audio and video frames in real time and feeds them to the model. Output is converted back to speech and played with low enough latency to feel like a natural conversation.

What makes this notable is the hardware. The M3 Pro is a mid-range MacBook chip, not an AI workstation. Running a continuous real-time pipeline on it - taking audio and video in, generating responses, speaking them back out - relies on Gemma's small size. Inference (the process of a model generating a response) is fast enough on Apple Silicon to avoid noticeable lag.

The practical ceiling is real: Gemma at this scale is not a frontier model. For tasks requiring nuanced reasoning, broad knowledge, or complex multi-step work, it doesn't compare to GPT-4o or Claude 3.5 Sonnet. But for specific, narrow use cases - a local voice assistant, an always-on accessibility tool, a private transcription pipeline - this architecture is functional today.

The approach is community-built rather than a commercial product. Privacy is the main selling point: no API calls means nothing leaves the machine. For anyone handling sensitive audio - medical conversations, legal discussions, client calls - that distinction matters more than benchmark scores.

Related Tools

More from today

Auto Agent: Open-Source Framework That Rewrites Its Own Setup Until It Wins

Cabinet: Open Source Local Knowledge Base Built for Claude Code

Developer Open Sources AI Job Search System After 740+ Offers and a Hire

Cookie Preferences