Gemma 4 VRAM Requirements

Hardware reference for running Gemma 4 locally. Sourced from Gemma4Guide.com (April 2026). The short answer: E4B needs 4–6 GB, 26B A4B needs 10–14 GB, 31B needs 20–24 GB — but the specific GPU matters more than the tier.

Model weight sizes

ModelBF16Q8Q5_K_MQ4_K_MMinimum GPU
E2B~2 GB~1.5 GB~1.2 GB~1 GBAny — phones, Raspberry Pi
E4B~5 GB~3.5 GB~2.8 GB~2.4 GB6 GB GPU (RTX 3060, RX 6600)
26B A4B (MoE)~14 GB~10 GB~8 GB~7 GB12 GB GPU at Q5 (RTX 3080 12GB / 4070)
31B dense~24 GB~17 GB~13 GB~11 GB16 GB GPU at Q4 (RTX 4080 / 3090)

Add 1–3 GB overhead for runtime, KV cache, and context. Right at the boundary? Size down or quantize harder.

Quantization quick reference

FormatQuality vs BF16Memory savingsUse when
BF16ReferenceDatacenter / 80GB H100
Q8_0Nearly identical~30% lessMaximum quality, can spare ~70% of BF16 VRAM
Q5_K_MVery close, small gaps on hard reasoning~45% lessBest default for most people
Q4_K_MNoticeable on complex tasks, fine for chat~55% lessFitting a tight VRAM budget
Q3_K_MMeaningful degradation~65% lessLast resort

GGUF files for all levels: Unsloth on HuggingFace. With Ollama: pick by tag, e.g. ollama run gemma4:26b-q5_K_M.

GPU lookup table

GPUVRAME4B26B A4B31B
RTX 3060 / 40608–12 GB⚠️ Q4 only, tight
RTX 3060 Ti / 4060 Ti8–16 GB✅ Q5 at 16GB⚠️ Q4 at 16GB tight
RTX 3070 / 40708–12 GB⚠️ Q5, 12GB tight
RTX 3080 (10GB)10 GB⚠️ Q4 only, tight
RTX 3080 12GB / 4070 Super12 GB✅ Q5 comfortable
RTX 3080 Ti / 408012–16 GB✅ Q5–Q8⚠️ Q4 at 16GB
RTX 3090 / 409024 GB✅ BF16✅ Q5–Q8
RX 7900 XTX (ROCm/Linux)24 GB✅ Q5✅ Q4–Q5

Apple Silicon

On Mac, GPU and CPU share memory. macOS reserves ~4–6 GB for the OS — subtract that from your RAM total before choosing a model.

MacRAMRecommendedNotes
M1 / M2 base8 GBE2B or E4B at Q4Very tight; close other apps
M1 / M2 base16 GBE4B comfortably; 26B A4B at Q4Fine for everyday
M2/M3 Pro18–36 GB26B A4B at Q5 or Q8Sweet spot for Mac local inference
M2/M3 Max32–96 GB31B at Q5; 26B A4B in BF16Workstation-class
M4 Max / M4 Ultra64–192 GB31B in BF16; multiple modelsNo compromises

Ollama uses Metal automatically on Mac — no extra config.

Decision rule

Your situationStart withReason
Any GPU under 8 GBE4BOnly realistic local option — and genuinely good
8–12 GB GPUE4B; or 26B A4B at Q4 to experiment26B A4B at Q4 leaves little headroom
12–16 GB GPU26B A4B at Q5Sweet spot — big-model reasoning at fraction of memory
24 GB GPU31B at Q5; or 26B A4B in BF16Quality-first becomes realistic
Mac 16 GBE4BLeave headroom for macOS
Mac 32–36 GB26B A4B at Q5 or Q8Best Mac experience for serious use

Why 26B A4B is the standout

Mixture-of-Experts: only ~4B parameters are active during inference, which is why it fits into 12–14 GB VRAM while delivering reasoning quality close to a 26B dense model. Most efficient Gemma 4 option for users who want more than E4B but can’t run 31B.

See Also