Large Language Models (LLM)
Key Selection Criteria
When choosing a large language model, consider the following key factors:
- Environment: Whether the usage environment has internet access determines the choice between cloud-based or local models. 
- Quality: The model's ability to generate responses and follow instructions. 
- Speed: Text generation speed and latency requirements to ensure model response time. 
- Pricing: Consider the model's usage costs based on requirements. (No need to consider model pricing on MaiAgent) 
- Others: Whether it supports multimodal capabilities, function calling support. 

Large Language Models Supported on MaiAgent
Cloud Models (Closed Source)
o4-mini
Faster than o3-mini-high, slightly lower quality than o3-mini-high
Yes
Choice for high quality and speed
o3-mini-high
High quality, medium speed, uses chain-of-thought reasoning for more complete, precise answers
Yes
High difficulty tasks requiring deep reasoning and creativity
o3-mini-medium
Fast speed, medium quality
Yes
Most business applications, simple creation or regular Q&A
o3-mini-low
Fastest speed, basic quality, lacks deep reasoning
Yes
Suitable for quick, simple tasks not requiring depth
o1-mini 2024-09-12
o1 series trained through reinforcement learning for complex reasoning. Thinks before answering, generating long internal thought chains. Slowest speed, high quality.
Yes
Very difficult problems when other LLMs fail
GPT-4o 2024-08-06
High quality and speed overall
No
Slightly lower instruction following and logic than Claude 3.5 Sonnet but faster. A common choice👍
GPT-4o mini 2024-07-18
Fast speed, medium quality. Slightly lower quality than Gemini 2.0 Flash
No
Alternative for simple tasks when Gemini 2.0 Flash unavailable
Claude 4 Sonnet
Medium-slow speed, strong structured data generation/extraction, especially good at tool calling; better logic reasoning and coding than Claude 3.7 Sonnet, further reduced hallucination.
Hybrid reasoning model
First choice for Agent mode👍 Suitable for high complexity tasks, professional domains and long conversations.
Claude 3.7 Sonnet
Medium-slow speed, excels at structured data generation, stronger logical reasoning than Claude 3.5 Sonnet. Low hallucination rate
Hybrid reasoning model
First choice for most cases👍 Suitable for complex tasks, professional domains, long conversations
Claude 3.5 Sonnet
Follows role instructions, weaker logical reasoning than Claude 3.7 Sonnet but faster speed. Low hallucination rate
No
Switch to Gemini 2.0 Flash if speed is too slow
Gemini 2.5 Pro
Better quality than Claude 3.7 Sonnet for longer conversations and code generation, but slightly weaker in Agent mode and tool calling
No
Can be used interchangeably with Claude 3.7 Sonnet
Gemini 2.0 Pro
Similar quality to Claude 3.5 Sonnet but slower
No
Alternative to Claude 3.5 Sonnet
Gemini 2.5 Flash
Fast speed, good multimodal capabilities
No
Gemini 2.0 Flash
Fast speed, medium quality
No
First choice for simple tasks👍
DeepSeek V3
Fast speed, high quality
Yes
Suitable for document retrieval and large database query tasks
DeepSeek R1 Distill Llama 70B
High response quality, medium speed (slower than DeepSeek V3)
Yes
Suitable for tasks requiring multi-step reasoning and background knowledge
DeepSeek R1
Slower response speed, but strong Chinese comprehension, high quality responses. Deep thinking and adjusts accurately to role instructions
Yes
For complex multi-turn Chinese conversations. Handling complex role instructions👍
Local Models (Open Source)
Below is a comparison table of mainstream open source models. For hardware requirements of open source models, please refer to the GPU chapter.
Meta Llama3.3 70B
High quality, medium speed
Data analysis, content creation
Meta Llama3.3 70B instruct (M2Ultra)
High quality, fast speed
Voice customer service
Meta Llama3.2 90B
Very high quality, medium speed
Professional Q&A, high precision tasks
Llama3-TAIDE-LX-70B-Chat (NCHC)
High quality, strong Chinese generation, medium speed
Customer service Q&A, knowledge Q&A
TAIDE-LX-70B-Chat (NCHC)
High quality, medium speed
Customer service Q&A, knowledge Q&A
Mistral Large (24.07)
Medium quality, lacks deep reasoning, fast speed
Customer service Q&A, simple text generation
Meta-Llama 3.1-70B
Medium quality, medium compute requirements
Customer service, knowledge Q&A, advanced translation and summarization
Meta-Llama 3.1-8B
Acceptable quality, low compute requirements
Translation, summarization
Mistral Large 2
High quality, high hardware requirements
Customer service, knowledge Q&A, advanced translation and summarization
Mistral 8x7B
Low quality, fastest speed
Translation, summarization
Gemma3 27B (M2 Ultra)
High quality, high hardware requirements
Professional knowledge Q&A, data analysis, complex content generation
Do Models Always Need Fine-tuning?
With the rapid development of artificial intelligence technology, language models have acquired powerful language understanding and generation capabilities, and are widely used in many fields. For example, pre-trained language models can easily handle daily conversations, article generation, and simple Q&A tasks.
However, when models face more challenging professional domain tasks, such as medical, legal, or technical support, relying solely on pre-trained models may not provide optimal performance. Some developers choose to perform Fine-tuning, which involves additional training for specific domains to improve the model's professional knowledge.
However, Fine-tuning is not the only solution. There are two other effective methods to achieve this goal: Prompt Engineering and RAG (Retrieval-Augmented Generation).
1. Prompt Engineering: Optimizing Model Performance Through Precise Prompts
Prompt Engineering involves designing precise prompt statements to guide the model in generating desired results. The core of this method is to design detailed expressions based on task requirements, helping the model narrow down the answer scope, understand context, and required output format.
Suppose we have a language model aimed at recommending suitable products based on user needs. In this case, if expressed as "I want to buy a high-performance phone," the model might generate imprecise answers because "high-performance" can have many different interpretations, possibly referring to processing speed, camera performance, battery life, etc.
To improve recommendation accuracy, we can perform Prompt Engineering by designing more specific questions or providing additional context to help the model better understand user needs. For example, changing to the following prompt:
"I need a phone with long battery life and a high-efficiency processor, priced between $500 to $800, please recommend several phones that meet these criteria."
2. RAG: Enhancing Generation Capabilities by Combining External Knowledge
RAG improves model performance by combining external knowledge retrieval with the generation process.
In traditional generation tasks, models rely only on knowledge learned during pre-training. RAG, however, uses a retrieval system to obtain external data in real-time and combines this data with the generation model to provide more accurate answers to questions or generate text.
For example, in the medical field, when a model is asked about a rare disease, RAG can first retrieve relevant information from professional medical databases, then generate more accurate answers based on this information. The advantage of this method is that even if the model hasn't encountered certain information during training, it can still provide high-quality responses by retrieving existing knowledge. This is particularly suitable for scenarios requiring real-time knowledge updates and can greatly expand the model's knowledge range.
For a more detailed introduction to RAG, please refer to the next chapter "RAG Knowledge Base Retrieval System Guide".
In conclusion, Fine-tuning, Prompt Engineering, and RAG each have their advantages and applicable scenarios. The most suitable strategy can be chosen based on application scenarios and requirements, rather than relying solely on Fine-tuning.
Prompt Engineering provides a low-cost and flexible solution by designing precise prompts to guide models in producing high-quality results.
RAG provides a method combining external knowledge with generation capabilities, offering more precise answers when dynamic knowledge acquisition is needed.
Fine-tuning can significantly improve model performance in specific domains but requires large amounts of professional data and consumes additional resources. It should be considered as a last resort when both Prompt Engineering and RAG approaches fail.
Last updated
Was this helpful?
