# GPU 算力硬體規劃

## Llama3 在 GPU 的推論速度（token/秒）

<figure><img src="/files/eAxbqpxG69z0LBBjNAZZ" alt=""><figcaption><p>主流 GPU 在 Llama3 8B / 70B 的效能比較</p></figcaption></figure>

<table><thead><tr><th width="166">GPU</th><th width="145">記憶體(VRAM)</th><th width="125">8B Q4_K_M</th><th width="89">8B F16</th><th width="129">70B Q4_K_M</th><th width="159">70B F16</th></tr></thead><tbody><tr><td>RTX 4090</td><td>24GB</td><td>127.74</td><td>54.34</td><td>超過記憶體</td><td>超過記憶體</td></tr><tr><td>RTX A6000</td><td>48GB</td><td>102.22</td><td>40.25</td><td>14.58</td><td>超過記憶體</td></tr><tr><td>L40S</td><td>48GB</td><td>113.60</td><td>43.42</td><td>15.31</td><td>超過記憶體</td></tr><tr><td>RTX 6000 Ada</td><td>48GB</td><td>130.99</td><td>51.97</td><td>18.36</td><td>超過記憶體</td></tr><tr><td>A100</td><td>80GB</td><td>138.31</td><td>54.56</td><td>22.11</td><td>超過記憶體</td></tr><tr><td>H100</td><td>80GB</td><td>144.49</td><td>67.79</td><td>25.01</td><td>超過記憶體</td></tr><tr><td>M2 Ultra</td><td>192GB</td><td>76.28</td><td>36.25</td><td>12.13</td><td>4.71</td></tr></tbody></table>

***

## Llama3 模型所需要的 VRAM

| 模型         | Q4\_K\_M（量化後） | F16（原始）   |
| ---------- | ------------- | --------- |
| Llama3 8B  | 4.58 GB       | 14.96 GB  |
| Llama3 70B | 39.59 GB      | 131.42 GB |

資料來源

{% embed url="<https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference>" %}

***

## 硬體配置推薦

MaiAgent 支援各種 Nvidia GPU。

<table><thead><tr><th width="330.84375">名稱</th><th>VRAM</th></tr></thead><tbody><tr><td>NVIDIA H200</td><td>141 GB</td></tr><tr><td>NVIDIA RTX PRO 6000 Blackwell</td><td>96 GB</td></tr><tr><td>NVIDIA H100</td><td>80 GB</td></tr><tr><td>RTX 6000 Ada</td><td>48 GB</td></tr><tr><td>NVIDIA A100</td><td>80GB</td></tr><tr><td>NVIDIA L40S</td><td>48 GB</td></tr></tbody></table>

若需要更詳細的資訊，歡迎聯繫 MaiAgent 的專業顧問討論，請來信 <mark style="color:blue;"><sales@maiagent.ai></mark>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.maiagent.ai/tech/platform-development/gpu.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
