# Large Language Models (LLM)

## **Key Selection Considerations**

When selecting a large language model, consider the following key factors:

1. **Environment**: Determine whether to use cloud-based or on-premise models based on internet connectivity.
2. **Quality**: The model's ability to generate responses and adherence to instructions.
3. **Speed**: Text generation speed and latency requirements to ensure timely model responses.
4. **Pricing**: Consider the model's usage cost and select an appropriate model based on usage requirements. (No need to consider model pricing on MaiAgent)
5. **Other**: Whether it supports multimodality and function calling.

***

## Large Language Models Supported on MaiAgent

### Closed-Source Models

<table><thead><tr><th>Model Name</th><th>Description</th><th width="125.28125">Is Reasoning Model</th><th>Use Cases</th></tr></thead><tbody><tr><td>o4-mini</td><td>Faster than o3-mini-high, slightly lower quality than o3-mini-high</td><td>Yes</td><td>High-quality, fast option</td></tr><tr><td>o3-mini-high</td><td>High quality, medium speed, uses chain-of-thought reasoning for multi-layer computation before answering to provide more complete and accurate responses</td><td>Yes</td><td>High-difficulty tasks requiring deep reasoning and creativity</td></tr><tr><td>o3-mini-medium</td><td>Fast speed, medium quality</td><td>Yes</td><td>Most business applications, simple creative tasks, or regular Q&#x26;A</td></tr><tr><td>o3-mini-low</td><td><mark style="color:red;"><strong>Fastest speed</strong></mark>, basic quality, lacks deep reasoning</td><td>Yes</td><td>Simple tasks prioritizing speed over depth</td></tr><tr><td>o1-mini 2024-09-12</td><td>o1 series large language models are trained with reinforcement learning to perform complex reasoning. o1 models think before answering, generating a long internal chain of thought before responding to users. Slowest speed, <mark style="color:red;"><strong>good quality</strong></mark>.</td><td>Yes</td><td>Extremely difficult problems when other LLMs fail</td></tr><tr><td>GPT-4o 2024-08-06</td><td>Above-average quality and speed</td><td>No</td><td>Instruction following and logic slightly inferior to Claude 3.5 Sonnet, but faster than Claude 3.5 Sonnet. A commonly used choice👍</td></tr><tr><td>GPT-4o mini 2024-07-18</td><td>Fast speed, medium quality. Quality slightly lower than Gemini 2.0 Flash</td><td>No</td><td>Simple tasks, alternative when Gemini 2.0 Flash is unavailable</td></tr><tr><td>Claude 4 Sonnet</td><td>Medium-to-slow speed, <mark style="color:red;">strong structured data generation/extraction capabilities</mark>, particularly excels at <mark style="color:red;">Tool Calling</mark>; logical reasoning and coding performance surpass Claude 3.7 Sonnet, with further reduced hallucination rate.</td><td>Hybrid reasoning model</td><td>First choice for Agent mode👍 Suitable for highly complex tasks, professional domain applications, and extended conversations.</td></tr><tr><td>Claude 3.7 Sonnet</td><td>Medium-to-slow speed, <mark style="color:red;"><strong>excels at generating structured</strong></mark> data, has stronger logical reasoning than Claude 3.5 Sonnet. <mark style="color:red;"><strong>Low hallucination rate</strong></mark></td><td>Hybrid reasoning model</td><td>First choice for most situations👍 Suitable for high-complexity tasks, professional domains, and long conversations</td></tr><tr><td>Claude 3.5 Sonnet</td><td>Follows role instructions well, weaker logical reasoning compared to Claude 3.7 Sonnet, but <mark style="color:red;"><strong>faster speed</strong></mark>. <mark style="color:red;"><strong>Low hallucination rate</strong></mark></td><td>No</td><td>Can switch to Gemini 2.0 Flash when speed feels too slow</td></tr><tr><td>Gemini 2.5 Pro</td><td>Better quality than Claude 3.7 Sonnet for longer conversations and code generation, but slightly inferior in Agent mode and tool calling</td><td>No</td><td>Can be used interchangeably with Claude 3.7 Sonnet</td></tr><tr><td>Gemini 2.0 Pro</td><td>Similar quality to Claude 3.5 Sonnet, but slower speed</td><td>No</td><td>Alternative to Claude 3.5 Sonnet</td></tr><tr><td>Gemini 2.5 Flash</td><td><mark style="color:red;"><strong>Fast speed</strong></mark>, good multimodal capabilities</td><td>No</td><td></td></tr><tr><td>Gemini 2.0 Flash</td><td><mark style="color:red;"><strong>Fast speed</strong></mark>, medium quality</td><td>No</td><td>First choice for simple tasks👍</td></tr><tr><td>DeepSeek V3</td><td>Fast speed, high quality</td><td>Yes</td><td>Suitable for document retrieval and large-scale database query tasks</td></tr><tr><td>DeepSeek R1 Distill Llama 70B</td><td>High response quality, medium speed (slower than DeepSeek V3)</td><td>Yes</td><td>Suitable for tasks requiring multi-step reasoning and background knowledge</td></tr><tr><td>DeepSeek R1</td><td>Slower response speed, but <mark style="color:red;"><strong>strong Chinese language understanding</strong></mark>, <mark style="color:red;"><strong>high-quality response content</strong></mark>. Deep thinking and truly adjusts based on role instruction content</td><td>Yes</td><td>Complex multi-turn Chinese conversations. Handling complex role instructions👍</td></tr></tbody></table>

### Open-Source Models

Below is a comparison table of mainstream open-source models. For hardware requirements, refer to the [GPU](https://docs.maiagent.ai/tech/maiagent-tech-en/platform-development/gpu) section.

| **Model Name**          | **Description**                                                                          | **Agent Support** | **Use Cases**                                   |
| ----------------------- | ---------------------------------------------------------------------------------------- | ----------------- | ----------------------------------------------- |
| GPT-OSS-120B            | Excels at organizing information in table format                                         | Yes               | Data analysis, content creation                 |
| Gemma3 27B              | Good image OCR performance                                                               | No                | Invoice recognition, image analysis             |
| Meta Llama3.3 70B       | Highly cost-effective general-purpose model with strong instruction following            | No                | Voice customer service, RAG Q\&A assistant      |
| Meta Llama3.2 90B       | Vision capabilities, can process both images and text                                    | No                | Professional domain Q\&A, high-precision tasks  |
| Meta Llama3.1 405B      | Extremely broad knowledge and high reasoning ability                                     | Yes               | Customer service Q\&A, knowledge Q\&A           |
| Llama3 Taiwan 70B       | Fine-tuned for Traditional Chinese and Taiwanese culture, accurate localized terminology | No                | Customer service Q\&A, RAG Q\&A assistant       |
| DeepSeek R1             | Reasoning-enhanced model, excels at solving complex problems through chain of thought    | No                | Scientific logical reasoning                    |
| DeepSeek V3.2           | MoE architecture, fast inference speed and extremely low cost                            | No                | Large-scale text summarization                  |
| Qwen3 235B              | Code and math specialized, deep logical understanding, suitable for hardcore tasks       | Yes               | Academic paper assistance                       |
| Qwen3 32B               | Dynamic balance between high-quality answers and efficient processing                    | Yes               | Multilingual dialogue and instruction following |
| Qwen3 8B                | Medium quality, balanced performance and cost                                            | Yes               | Chat assistant, customer service FAQ            |
| Qwen2.5 VL 72B instruct | Vision-language model, accurate image detail recognition                                 | Yes               | Multimodal chat assistant                       |
| Mistral Large (24.07)   | Medium quality, lacks deep reasoning ability, fast speed                                 | No                | Customer service Q\&A, simple text generation   |

## Is Fine-tuning Always Necessary?

With the rapid development of artificial intelligence technology, language models have acquired powerful language understanding and generation capabilities and are widely applied in many fields. For example, pre-trained language models can easily handle daily conversations, article generation, and simple Q\&A tasks.

However, when models face more challenging professional domain tasks, such as healthcare, legal, or technical support, relying solely on pre-trained models may not provide optimal performance. Some developers choose to perform **Fine-tuning**, which involves additional training for specific domains to enhance the model's professional knowledge.

However, Fine-tuning is not the only solution. There are two other effective methods to achieve this goal: **Prompt Engineering** and **RAG (Retrieval-Augmented Generation)**.

### 1. Prompt Engineering: Optimizing Model Performance Through Precise Prompts

Prompt Engineering involves designing precise prompt statements to guide the model in generating desired results. The core of this approach is to design detailed expressions based on task requirements, helping the model narrow the scope of answers and understand the context and required output format.

Suppose there is a language model whose purpose is to recommend suitable products based on user needs. In this case, if expressed as "I want to buy a high-performance phone," it may lead the model to generate insufficiently precise answers because "high-performance" can have many different interpretations, possibly referring to processing speed, camera performance, battery life, etc.

To improve recommendation accuracy, we can perform **Prompt Engineering** and design more specific questions or provide additional context to guide the model to better understand user needs. For example, change to the following prompt statement:

> "I need a phone with **long battery life** and a **powerful processor**, priced between **$500 and $800**. Please recommend several phones that meet these criteria."

### 2. RAG: Combining External Knowledge to Enhance Generation Capabilities

**RAG** enhances model performance by combining external knowledge retrieval with the generation process.

In traditional generation tasks, models rely only on knowledge learned during pre-training. RAG uses a retrieval system to obtain external data in real-time and combines this data with the generation model to answer questions or generate text more accurately.

For example, in healthcare, when a model is asked about a rare disease, RAG can first retrieve relevant information from professional medical databases, then generate more accurate answers based on this information. The advantage of this method is that even if the model hasn't encountered certain data during training, it can still make high-quality responses by retrieving existing knowledge. This is particularly suitable for scenarios requiring real-time knowledge updates and can greatly expand the model's knowledge scope.

{% hint style="success" %}
For a more detailed introduction to RAG, see the next chapter ["RAG Knowledge Base Retrieval System Explanation"](https://docs.maiagent.ai/tech/maiagent-tech-en/quickstart/rag).
{% endhint %}

In summary, Fine-tuning, Prompt Engineering, and RAG each have their advantages and applicable scopes. You can choose the most suitable strategy based on application scenarios and needs, without relying solely on Fine-tuning.

**Prompt Engineering** provides a low-cost and flexible solution by designing precise prompts to guide the model to produce high-quality results.

**RAG** provides a method combining external knowledge with generation capabilities, offering more accurate answers when dynamic knowledge retrieval is needed.

**Fine-tuning** can significantly improve model performance in specific domains but requires substantial specialized data and consumes additional resources. It should be considered a last resort when both **Prompt Engineering** and **RAG** fail to achieve success.
