LLM VRAM Calculator

Calculate how much GPU memory you need to run any language model locally.

Quick presets

e.g. 7 for a 7B model, 70 for 70B

Estimated VRAM needed

— GB

Compatible Hardware

 

Apple Mac mini (M4, 16GB)

16 GB VRAM

Apple Mac mini (M4, 24GB)

24 GB VRAM

Apple Mac mini (M4 Pro, 24GB)

24 GB VRAM

Apple Mac mini (M4 Pro, 48GB)

48 GB VRAM

Apple Mac Studio (M4 Max, 64GB)

64 GB VRAM

Apple Mac Studio (M4 Max, 128GB)

128 GB VRAM

NVIDIA GeForce RTX 4060 (8GB)

8 GB VRAM

NVIDIA GeForce RTX 4070 (12GB)

12 GB VRAM

NVIDIA GeForce RTX 4090 (24GB)

24 GB VRAM

NVIDIA GeForce RTX 3090 (24GB)

24 GB VRAM

NVIDIA GeForce RTX 5090 (32GB)

32 GB VRAM

Intel Arc B580 (12GB)

12 GB VRAM

AMD Radeon RX 7900 XTX (24GB)

24 GB VRAM

NVIDIA RTX 4080 (16GB)

16 GB VRAM

NVIDIA RTX 4060 Ti (16GB)

16 GB VRAM

NVIDIA RTX 5080 (16GB)

16 GB VRAM

NVIDIA RTX 5070 (12GB)

12 GB VRAM

NVIDIA RTX 5070 Ti (16GB)

16 GB VRAM

NVIDIA RTX 4070 Ti Super (16GB)

16 GB VRAM

NVIDIA DGX Spark

128 GB VRAM

Beelink SEi12 + eGPU RTX 4090 (24GB)

24 GB VRAM

How Much VRAM Do You Need for LLMs?

GPU VRAM (or Apple Silicon unified memory) is the primary bottleneck when running large language models locally. The model weights must fit entirely in VRAM for efficient inference — otherwise the system falls back to slow RAM offloading.

The VRAM requirement scales linearly with parameter count and quantization bit-width:

VRAM Requirements by Model Size

Model SizeQ4_K_MQ8FP16Example Models
1B 2 GB 3 GB 4 GB Qwen3-0.6B, SmolLM2
3B 3 GB 5 GB 8 GB Qwen3 4B, Phi-3.5 Mini, Llama 3.2 3B
8B 5 GB 9 GB 17 GB Qwen3 8B, Llama 3.1 8B, DeepSeek-R1-Distill-8B
14B 8 GB 15 GB 29 GB Qwen3 14B, Phi-4 14B, DeepSeek-R1-Distill-14B
32B 18 GB 34 GB 66 GB Qwen3 32B, DeepSeek-R1-Distill-32B
70B 37 GB 72 GB 142 GB Llama 3.3 70B, Qwen3 72B, DeepSeek-R1-Distill-70B

The Formula

VRAM (GB) = ⌈ parameters_B × bytes_per_param + context_overhead ⌉

The +1.5 GB base overhead covers the KV cache at 4K context, activation buffers, and framework runtime. Longer context windows increase this significantly — a 128K context window adds roughly 20–25 GB depending on model architecture.

For Mixture-of-Experts models (Llama 4 Scout, Mixtral), use the total parameter count in the calculator — all expert weights must be loaded into VRAM even though only a fraction activate per token. For example, Llama 4 Scout is named "17B active" but has ~109B total parameters, requiring ~60 GB at Q4_K_M. See the Llama 4 guide for details.

Frequently Asked Questions

How much VRAM does Llama 3.3 70B need?

Llama 3.3 70B needs approximately 26 GB at Q2_K, 43 GB at Q4_K_M, and 75 GB at Q8. Q4_K_M is the recommended quality level, which requires a Mac Studio M4 Max 64 GB or dual RTX 4090 GPUs on NVIDIA hardware.

How much VRAM does Gemma 3 need?

Gemma 3 1B needs ~1 GB (runs on CPU). Gemma 3 4B needs ~3 GB (any GPU). Gemma 3 12B needs ~7.5 GB (8 GB GPU at Q4). Gemma 3 27B needs ~16 GB at Q4_K_M — it fits in a 16 GB GPU thanks to grouped query attention.

How much VRAM does Phi-4 need?

Phi-4 (14B) needs approximately 9 GB at Q4_K_M and 15 GB at Q8. Any 12 GB GPU like the Intel Arc B580 or RTX 4070 can run it at Q4_K_M. Phi-4-mini (3.8B) needs only 2.5 GB — runs on any hardware.

How much VRAM does Qwen3 need?

Qwen3 8B needs ~5.5 GB (fits any 8 GB GPU). Qwen3 14B needs ~9 GB (requires 12 GB+). Qwen3 32B needs ~18 GB (fits 24 GB GPUs at Q4_K_M). Qwen3 30B-A3B MoE needs ~20 GB total weight.

Can I run a 13B model on 8 GB VRAM?

No. A 13B model needs about 8.5 GB at Q4_K_M, which exceeds 8 GB. You need at least 12 GB VRAM for 13B models. With 8 GB you are limited to up to 8B models at Q4_K_M.

Does context length affect VRAM?

Yes. The KV cache grows with context: 4K adds ~1.5 GB, 32K adds ~8-10 GB, 128K adds ~20-25 GB. Use the context overhead slider above to account for your target window.

How much VRAM for a 7B model?

A 7B model needs approximately 5 GB at Q4_K_M, 8 GB at Q8, and 15 GB at FP16. Any 8 GB GPU like the RTX 4060 can run it at Q4_K_M with good performance.

How much VRAM for a 70B model?

A 70B model needs approximately 37 GB at Q4_K_M, 72 GB at Q8, and 142 GB at FP16. On consumer hardware you need a Mac Studio M4 Max 64 GB or dual RTX 4090 (48 GB combined).

Embed This Calculator

Add the VRAM calculator to your blog or documentation site. Copy the snippet below — no signup required.

<iframe src="https://llmhardware.io/embed/vram-calculator" width="100%" height="420" style="border:none;border-radius:10px;" loading="lazy" title="LLM VRAM Calculator"></iframe>

Related Guides