Large Language Models (LLM)
Key Selection Considerations
When selecting a large language model, consider the following key factors:
Environment: Determine whether to use cloud-based or on-premise models based on internet connectivity.
Quality: The model's ability to generate responses and adherence to instructions.
Speed: Text generation speed and latency requirements to ensure timely model responses.
Pricing: Consider the model's usage cost and select an appropriate model based on usage requirements. (No need to consider model pricing on MaiAgent)
Other: Whether it supports multimodality and function calling.
Large Language Models Supported on MaiAgent
Closed-Source Models
o4-mini
Faster than o3-mini-high, slightly lower quality than o3-mini-high
Yes
High-quality, fast option
o3-mini-high
High quality, medium speed, uses chain-of-thought reasoning for multi-layer computation before answering to provide more complete and accurate responses
Yes
High-difficulty tasks requiring deep reasoning and creativity
o3-mini-medium
Fast speed, medium quality
Yes
Most business applications, simple creative tasks, or regular Q&A
o3-mini-low
Fastest speed, basic quality, lacks deep reasoning
Yes
Simple tasks prioritizing speed over depth
o1-mini 2024-09-12
o1 series large language models are trained with reinforcement learning to perform complex reasoning. o1 models think before answering, generating a long internal chain of thought before responding to users. Slowest speed, good quality.
Yes
Extremely difficult problems when other LLMs fail
GPT-4o 2024-08-06
Above-average quality and speed
No
Instruction following and logic slightly inferior to Claude 3.5 Sonnet, but faster than Claude 3.5 Sonnet. A commonly used choice👍
GPT-4o mini 2024-07-18
Fast speed, medium quality. Quality slightly lower than Gemini 2.0 Flash
No
Simple tasks, alternative when Gemini 2.0 Flash is unavailable
Claude 4 Sonnet
Medium-to-slow speed, strong structured data generation/extraction capabilities, particularly excels at Tool Calling; logical reasoning and coding performance surpass Claude 3.7 Sonnet, with further reduced hallucination rate.
Hybrid reasoning model
First choice for Agent mode👍 Suitable for highly complex tasks, professional domain applications, and extended conversations.
Claude 3.7 Sonnet
Medium-to-slow speed, excels at generating structured data, has stronger logical reasoning than Claude 3.5 Sonnet. Low hallucination rate
Hybrid reasoning model
First choice for most situations👍 Suitable for high-complexity tasks, professional domains, and long conversations
Claude 3.5 Sonnet
Follows role instructions well, weaker logical reasoning compared to Claude 3.7 Sonnet, but faster speed. Low hallucination rate
No
Can switch to Gemini 2.0 Flash when speed feels too slow
Gemini 2.5 Pro
Better quality than Claude 3.7 Sonnet for longer conversations and code generation, but slightly inferior in Agent mode and tool calling
No
Can be used interchangeably with Claude 3.7 Sonnet
Gemini 2.0 Pro
Similar quality to Claude 3.5 Sonnet, but slower speed
No
Alternative to Claude 3.5 Sonnet
Gemini 2.5 Flash
Fast speed, good multimodal capabilities
No
Gemini 2.0 Flash
Fast speed, medium quality
No
First choice for simple tasks👍
DeepSeek V3
Fast speed, high quality
Yes
Suitable for document retrieval and large-scale database query tasks
DeepSeek R1 Distill Llama 70B
High response quality, medium speed (slower than DeepSeek V3)
Yes
Suitable for tasks requiring multi-step reasoning and background knowledge
DeepSeek R1
Slower response speed, but strong Chinese language understanding, high-quality response content. Deep thinking and truly adjusts based on role instruction content
Yes
Complex multi-turn Chinese conversations. Handling complex role instructions👍
Open-Source Models
Below is a comparison table of mainstream open-source models. For hardware requirements, refer to the GPU section.
Model Name
Description
Agent Support
Use Cases
GPT-OSS-120B
Excels at organizing information in table format
Yes
Data analysis, content creation
Gemma3 27B
Good image OCR performance
No
Invoice recognition, image analysis
Meta Llama3.3 70B
Highly cost-effective general-purpose model with strong instruction following
No
Voice customer service, RAG Q&A assistant
Meta Llama3.2 90B
Vision capabilities, can process both images and text
No
Professional domain Q&A, high-precision tasks
Meta Llama3.1 405B
Extremely broad knowledge and high reasoning ability
Yes
Customer service Q&A, knowledge Q&A
Llama3 Taiwan 70B
Fine-tuned for Traditional Chinese and Taiwanese culture, accurate localized terminology
No
Customer service Q&A, RAG Q&A assistant
DeepSeek R1
Reasoning-enhanced model, excels at solving complex problems through chain of thought
No
Scientific logical reasoning
DeepSeek V3.2
MoE architecture, fast inference speed and extremely low cost
No
Large-scale text summarization
Qwen3 235B
Code and math specialized, deep logical understanding, suitable for hardcore tasks
Yes
Academic paper assistance
Qwen3 32B
Dynamic balance between high-quality answers and efficient processing
Yes
Multilingual dialogue and instruction following
Qwen3 8B
Medium quality, balanced performance and cost
Yes
Chat assistant, customer service FAQ
Qwen2.5 VL 72B instruct
Vision-language model, accurate image detail recognition
Yes
Multimodal chat assistant
Mistral Large (24.07)
Medium quality, lacks deep reasoning ability, fast speed
No
Customer service Q&A, simple text generation
Is Fine-tuning Always Necessary?
With the rapid development of artificial intelligence technology, language models have acquired powerful language understanding and generation capabilities and are widely applied in many fields. For example, pre-trained language models can easily handle daily conversations, article generation, and simple Q&A tasks.
However, when models face more challenging professional domain tasks, such as healthcare, legal, or technical support, relying solely on pre-trained models may not provide optimal performance. Some developers choose to perform Fine-tuning, which involves additional training for specific domains to enhance the model's professional knowledge.
However, Fine-tuning is not the only solution. There are two other effective methods to achieve this goal: Prompt Engineering and RAG (Retrieval-Augmented Generation).
1. Prompt Engineering: Optimizing Model Performance Through Precise Prompts
Prompt Engineering involves designing precise prompt statements to guide the model in generating desired results. The core of this approach is to design detailed expressions based on task requirements, helping the model narrow the scope of answers and understand the context and required output format.
Suppose there is a language model whose purpose is to recommend suitable products based on user needs. In this case, if expressed as "I want to buy a high-performance phone," it may lead the model to generate insufficiently precise answers because "high-performance" can have many different interpretations, possibly referring to processing speed, camera performance, battery life, etc.
To improve recommendation accuracy, we can perform Prompt Engineering and design more specific questions or provide additional context to guide the model to better understand user needs. For example, change to the following prompt statement:
"I need a phone with long battery life and a powerful processor, priced between $500 and $800. Please recommend several phones that meet these criteria."
2. RAG: Combining External Knowledge to Enhance Generation Capabilities
RAG enhances model performance by combining external knowledge retrieval with the generation process.
In traditional generation tasks, models rely only on knowledge learned during pre-training. RAG uses a retrieval system to obtain external data in real-time and combines this data with the generation model to answer questions or generate text more accurately.
For example, in healthcare, when a model is asked about a rare disease, RAG can first retrieve relevant information from professional medical databases, then generate more accurate answers based on this information. The advantage of this method is that even if the model hasn't encountered certain data during training, it can still make high-quality responses by retrieving existing knowledge. This is particularly suitable for scenarios requiring real-time knowledge updates and can greatly expand the model's knowledge scope.
For a more detailed introduction to RAG, see the next chapter "RAG Knowledge Base Retrieval System Explanation".
In summary, Fine-tuning, Prompt Engineering, and RAG each have their advantages and applicable scopes. You can choose the most suitable strategy based on application scenarios and needs, without relying solely on Fine-tuning.
Prompt Engineering provides a low-cost and flexible solution by designing precise prompts to guide the model to produce high-quality results.
RAG provides a method combining external knowledge with generation capabilities, offering more accurate answers when dynamic knowledge retrieval is needed.
Fine-tuning can significantly improve model performance in specific domains but requires substantial specialized data and consumes additional resources. It should be considered a last resort when both Prompt Engineering and RAG fail to achieve success.
Last updated
Was this helpful?
