Question 1

How much RAM do I need to run a 70B parameter model?

Accepted Answer

For a 70B model at Q4 quantization, you need approximately 37GB of RAM/VRAM. A Mac Mini M4 Pro with 48GB or an RTX 5090 with 32GB VRAM (with system RAM offloading) would work.

Question 2

Is Apple Silicon good for running local LLMs?

Accepted Answer

Yes. Apple Silicon's unified memory architecture allows the full system memory to be used for model inference, making it excellent for larger models. The M4 Pro with 48GB can run 70B models.

Question 3

What is the best GPU for local LLM inference?

Accepted Answer

The NVIDIA RTX 5090 with 32GB GDDR7 offers the best consumer-grade performance at 1792 GB/s memory bandwidth. The RTX 4090 remains excellent value with 24GB VRAM.

Hardware Recommender

Frequently Asked Questions

How much RAM do I need to run a 70B parameter model?

Is Apple Silicon good for running local LLMs?

What is the best GPU for local LLM inference?