The gap between closed-source frontier models and open-weight alternatives is shrinking rapidly. Today, we have access to high-performance open-source models that behave almost like frontier models, providing incredible power directly on our own hardware.

One of the most impressive combinations I’ve found is gemma-4-31B-it-MLX-8bit paired with oMLX and OpenCode.

The Hardware Advantage

Running a 31B parameter model typically requires significant VRAM. However, thanks to Apple’s unified memory architecture, Gemma 4 runs flawlessly on a Mac Mini with 64GB of RAM. The ability for the GPU to access the system memory directly removes the typical bottlenecks associated with consumer-grade GPUs.

Why oMLX?

While tools like LMStudio and Ollama are fantastic for general use and ease of setup, oMLX has shown superior performance in my tests. By leveraging the MLX framework specifically optimized for Apple Silicon, oMLX maximizes the hardware’s potential, resulting in faster token generation and more efficient resource utilization.

Elevating the Experience with OpenCode

Having a powerful model is one thing; interacting with it effectively is another. OpenCode is an awesome solution for working with LLMs. It transforms the model from a simple chat interface into a proactive software engineering partner, capable of navigating codebases and executing tasks with precision.

Tracking Progress

If you want to see how these open models stack up against the giants, I highly recommend following the LMSYS Chatbot Arena Leaderboard. It is the gold standard for tracking real-world model performance based on crowdsourced human preference.

The era of “local-first” high-performance AI is here, and the combination of Gemma 4, oMLX, and OpenCode makes it a reality.

← Blog index