Question 1

How do I calculate VRAM for an LLM?

Accepted Answer

Use the formula: VRAM (GB) = ceil(parameters_billions × bytes_per_param + 1.5). For Q4_K_M use 0.5 bytes/param; Q8 use 1.0; FP16 use 2.0. Add more for longer context windows.

Question 2

How much VRAM for a 7B model?

Accepted Answer

A 7B model needs approximately 5 GB at Q4_K_M, 8 GB at Q8, and 15 GB at FP16 quantization. An RTX 4060 (8 GB) or any 8 GB+ GPU can run it at Q4_K_M.

Question 3

How much VRAM for a 70B model?

Accepted Answer

A 70B model needs approximately 37 GB at Q4_K_M, 72 GB at Q8, and 142 GB at FP16 quantization. On consumer hardware, only Mac Studio M4 Max 64GB or dual RTX 4090 (48 GB combined) can run it.

Question 4

How much VRAM does Llama 3.3 70B need?

Accepted Answer

Llama 3.3 70B needs approximately 26 GB at Q2_K, 43 GB at Q4_K_M, and 75 GB at Q8. Q4_K_M is the recommended quality level, which means you need a Mac Studio M4 Max 64 GB (which fits it via unified memory) or dual RTX 4090 GPUs for NVIDIA hardware.

Question 5

How much VRAM does Gemma 3 need?

Accepted Answer

Gemma 3 VRAM requirements by size: 1B needs ~1 GB (runs on CPU), 4B needs ~3 GB (any GPU), 12B needs ~7.5 GB (8 GB GPU at Q4), 27B needs ~16 GB (16 GB GPU at Q4_K_M). Gemma 3 27B is unusually efficient — it fits in 16 GB at Q4_K_M thanks to its grouped query attention architecture.

Question 6

How much VRAM does Phi-4 need?

Accepted Answer

Phi-4 (14B) needs approximately 9 GB at Q4_K_M, 15 GB at Q8, and 29 GB at FP16. Phi-4-mini (3.8B) needs only 2.5 GB at Q4_K_M. Any 12 GB GPU (Intel Arc B580, RTX 4070) can run Phi-4 at Q4_K_M.

Question 7

How much VRAM does Qwen3 need?

Accepted Answer

Qwen3 VRAM requirements vary by size. Qwen3 0.6B: ~1 GB. Qwen3 1.7B: ~1.5 GB. Qwen3 4B: ~3 GB. Qwen3 8B: ~5.5 GB. Qwen3 14B: ~9 GB. Qwen3 32B: ~18 GB (fits in 24 GB at Q4_K_M). Qwen3 30B-A3B MoE: ~20 GB total weight loaded.

Question 8

Can I run a 13B model on 8 GB VRAM?

Accepted Answer

No. A 13B model needs approximately 8.5 GB at Q4_K_M, which exceeds 8 GB VRAM. You need at least 12 GB VRAM to run 13B models. With 8 GB you are limited to models up to 8B parameters at Q4_K_M, or smaller models at higher precision.

Question 9

Does context length affect VRAM?

Accepted Answer

Yes. The KV cache grows with context length. At 4K context the overhead is about 1.5 GB. At 32K context it adds roughly 8–10 GB. At 128K context it can add 20–25 GB. Use the context overhead slider in the VRAM calculator to account for your target context window.

LLM VRAM Calculator

Compatible Hardware

How Much VRAM Do You Need for LLMs?

VRAM Requirements by Model Size

The Formula

Frequently Asked Questions

Embed This Calculator

Related Guides

Model Size	Q4_K_M	Q8	FP16	Example Models
1B	2 GB	3 GB	4 GB	Qwen3-0.6B, SmolLM2
3B	3 GB	5 GB	8 GB	Qwen3 4B, Phi-3.5 Mini, Llama 3.2 3B
8B	5 GB	9 GB	17 GB	Qwen3 8B, Llama 3.1 8B, DeepSeek-R1-Distill-8B
14B	8 GB	15 GB	29 GB	Qwen3 14B, Phi-4 14B, DeepSeek-R1-Distill-14B
32B	18 GB	34 GB	66 GB	Qwen3 32B, DeepSeek-R1-Distill-32B
70B	37 GB	72 GB	142 GB	Llama 3.3 70B, Qwen3 72B, DeepSeek-R1-Distill-70B