Tagged

#LLM

3 posts

April 11, 2026
Gemma 4 E2B vs E4B Benchmark: The Hidden Thinking Mode That Makes the Smaller Model 20× Slower

A hands-on RTX 3070 benchmark of Gemma 4 E2B and E4B: TPS, TTFT, quality on hard and practical tasks, plus a deep dive that traces E2B's mysterious slowdown to an <|think|> token that Ollama's gemma4 renderer injects by default — contradicting the docs. Ends with best-practice presets and a note on running Claude Code locally.
- AI
- LLM
- Gemma
- Ollama
April 3, 2026
Google Gemma 4: Open-Source, Multimodal, Apache 2.0 — and 1.5GB to Run on a Phone

Google DeepMind's Gemma 4 is an open-weight model family under Apache 2.0, with four sizes (31B Dense, 26B MoE, E4B, E2B), native multimodal input, and edge deployment down to 1.5GB of memory. A hands-on tour of the benchmarks, the new Agent Skills story, and every way to run it — from your phone to a workstation.
September 12, 2025
Ollama Tutorial: Run Local LLMs on Windows, Linux, and macOS

A beginner-friendly walkthrough of Ollama — the easiest way to run open-source large language models on your own machine. Covers installation on Windows, Linux, and macOS, running your first model, choosing a model size for your hardware, and calling Ollama from Python or any REST client.

Gemma 4 E2B vs E4B Benchmark: The Hidden Thinking Mode That Makes the Smaller Model 20× Slower