GPU Computing Hardware Planning
Llama3 Inference Speed on GPUs (tokens/second)

GPU
Memory(VRAM)
8B Q4_K_M
8B F16
70B Q4_K_M
70B F16
VRAM Requirements for Llama3 Models
Model
Q4_K_M (Quantized)
F16 (Original)
Hardware Configuration Recommendations
Last updated
Was this helpful?
