GPU Computing Hardware Planning

Llama3 Inference Speed on GPUs (tokens/second)

Performance Comparison of Mainstream GPUs on Llama3 8B / 70B
GPU
Memory(VRAM)
8B Q4_K_M
8B F16
70B Q4_K_M
70B F16

RTX 4090

24GB

127.74

54.34

Out of Memory

Out of Memory

RTX A6000

48GB

102.22

40.25

14.58

Out of Memory

L40S

48GB

113.60

43.42

15.31

Out of Memory

RTX 6000 Ada

48GB

130.99

51.97

18.36

Out of Memory

A100

80GB

138.31

54.56

22.11

Out of Memory

H100

80GB

144.49

67.79

25.01

Out of Memory

M2 Ultra

192GB

76.28

36.25

12.13

4.71


VRAM Requirements for Llama3 Models

Model
Q4_K_M (Quantized)
F16 (Original)

Llama3 8B

4.58 GB

14.96 GB

Llama3 70B

39.59 GB

131.42 GB

Source


Hardware Configuration Recommendations

MaiAgent recommends two combinations suitable for different groups:

  1. Two H100(80GB): Higher budget, prioritizing quality and performance

  2. L40S(48GB) and RTX 6000 Ada(48GB): Standard budget, focusing on cost-effectiveness

For more detailed information, please contact MaiAgent's professional consultants at [email protected]

Last updated

Was this helpful?