Tagged

#Benchmark

1 post

April 11, 2026
Gemma 4 E2B vs E4B Benchmark: The Hidden Thinking Mode That Makes the Smaller Model 20× Slower

A hands-on RTX 3070 benchmark of Gemma 4 E2B and E4B: TPS, TTFT, quality on hard and practical tasks, plus a deep dive that traces E2B's mysterious slowdown to an <|think|> token that Ollama's gemma4 renderer injects by default — contradicting the docs. Ends with best-practice presets and a note on running Claude Code locally.
- AI
- LLM
- Gemma
- Ollama