Google DeepMind released Gemma 4 in early 2026, and the 31B instruction-tuned variant hits a sweet spot: big enough to compete with proprietary models on reasoning benchmarks, small enough to run on a decent consumer GPU. It scores 85.2% on MMLU Pro and 89.2% on AIME 2026 without tools, which puts it in the same conversation as models twice its size.
About 10 min