Large Language Models (LLM)

Key Selection Considerations

When selecting a large language model, consider the following key factors:

  1. Environment: Determine whether to use cloud-based or on-premise models based on internet connectivity.

  2. Quality: The model's ability to generate responses and adherence to instructions.

  3. Speed: Text generation speed and latency requirements to ensure timely model responses.

  4. Pricing: Consider the model's usage cost and select an appropriate model based on usage requirements. (No need to consider model pricing on MaiAgent)

  5. Other: Whether it supports multimodality and function calling.


Large Language Models Supported on MaiAgent

Closed-Source Models

Model Name
Description
Is Reasoning Model
Use Cases

o4-mini

Faster than o3-mini-high, slightly lower quality than o3-mini-high

Yes

High-quality, fast option

o3-mini-high

High quality, medium speed, uses chain-of-thought reasoning for multi-layer computation before answering to provide more complete and accurate responses

Yes

High-difficulty tasks requiring deep reasoning and creativity

o3-mini-medium

Fast speed, medium quality

Yes

Most business applications, simple creative tasks, or regular Q&A

o3-mini-low

Fastest speed, basic quality, lacks deep reasoning

Yes

Simple tasks prioritizing speed over depth

o1-mini 2024-09-12

o1 series large language models are trained with reinforcement learning to perform complex reasoning. o1 models think before answering, generating a long internal chain of thought before responding to users. Slowest speed, good quality.

Yes

Extremely difficult problems when other LLMs fail

GPT-4o 2024-08-06

Above-average quality and speed

No

Instruction following and logic slightly inferior to Claude 3.5 Sonnet, but faster than Claude 3.5 Sonnet. A commonly used choice👍

GPT-4o mini 2024-07-18

Fast speed, medium quality. Quality slightly lower than Gemini 2.0 Flash

No

Simple tasks, alternative when Gemini 2.0 Flash is unavailable

Claude 4 Sonnet

Medium-to-slow speed, strong structured data generation/extraction capabilities, particularly excels at Tool Calling; logical reasoning and coding performance surpass Claude 3.7 Sonnet, with further reduced hallucination rate.

Hybrid reasoning model

First choice for Agent mode👍 Suitable for highly complex tasks, professional domain applications, and extended conversations.

Claude 3.7 Sonnet

Medium-to-slow speed, excels at generating structured data, has stronger logical reasoning than Claude 3.5 Sonnet. Low hallucination rate

Hybrid reasoning model

First choice for most situations👍 Suitable for high-complexity tasks, professional domains, and long conversations

Claude 3.5 Sonnet

Follows role instructions well, weaker logical reasoning compared to Claude 3.7 Sonnet, but faster speed. Low hallucination rate

No

Can switch to Gemini 2.0 Flash when speed feels too slow

Gemini 2.5 Pro

Better quality than Claude 3.7 Sonnet for longer conversations and code generation, but slightly inferior in Agent mode and tool calling

No

Can be used interchangeably with Claude 3.7 Sonnet

Gemini 2.0 Pro

Similar quality to Claude 3.5 Sonnet, but slower speed

No

Alternative to Claude 3.5 Sonnet

Gemini 2.5 Flash

Fast speed, good multimodal capabilities

No

Gemini 2.0 Flash

Fast speed, medium quality

No

First choice for simple tasks👍

DeepSeek V3

Fast speed, high quality

Yes

Suitable for document retrieval and large-scale database query tasks

DeepSeek R1 Distill Llama 70B

High response quality, medium speed (slower than DeepSeek V3)

Yes

Suitable for tasks requiring multi-step reasoning and background knowledge

DeepSeek R1

Slower response speed, but strong Chinese language understanding, high-quality response content. Deep thinking and truly adjusts based on role instruction content

Yes

Complex multi-turn Chinese conversations. Handling complex role instructions👍

Open-Source Models

Below is a comparison table of mainstream open-source models. For hardware requirements, refer to the GPU section.

Model Name

Description

Agent Support

Use Cases

GPT-OSS-120B

Excels at organizing information in table format

Yes

Data analysis, content creation

Gemma3 27B

Good image OCR performance

No

Invoice recognition, image analysis

Meta Llama3.3 70B

Highly cost-effective general-purpose model with strong instruction following

No

Voice customer service, RAG Q&A assistant

Meta Llama3.2 90B

Vision capabilities, can process both images and text

No

Professional domain Q&A, high-precision tasks

Meta Llama3.1 405B

Extremely broad knowledge and high reasoning ability

Yes

Customer service Q&A, knowledge Q&A

Llama3 Taiwan 70B

Fine-tuned for Traditional Chinese and Taiwanese culture, accurate localized terminology

No

Customer service Q&A, RAG Q&A assistant

DeepSeek R1

Reasoning-enhanced model, excels at solving complex problems through chain of thought

No

Scientific logical reasoning

DeepSeek V3.2

MoE architecture, fast inference speed and extremely low cost

No

Large-scale text summarization

Qwen3 235B

Code and math specialized, deep logical understanding, suitable for hardcore tasks

Yes

Academic paper assistance

Qwen3 32B

Dynamic balance between high-quality answers and efficient processing

Yes

Multilingual dialogue and instruction following

Qwen3 8B

Medium quality, balanced performance and cost

Yes

Chat assistant, customer service FAQ

Qwen2.5 VL 72B instruct

Vision-language model, accurate image detail recognition

Yes

Multimodal chat assistant

Mistral Large (24.07)

Medium quality, lacks deep reasoning ability, fast speed

No

Customer service Q&A, simple text generation

Is Fine-tuning Always Necessary?

With the rapid development of artificial intelligence technology, language models have acquired powerful language understanding and generation capabilities and are widely applied in many fields. For example, pre-trained language models can easily handle daily conversations, article generation, and simple Q&A tasks.

However, when models face more challenging professional domain tasks, such as healthcare, legal, or technical support, relying solely on pre-trained models may not provide optimal performance. Some developers choose to perform Fine-tuning, which involves additional training for specific domains to enhance the model's professional knowledge.

However, Fine-tuning is not the only solution. There are two other effective methods to achieve this goal: Prompt Engineering and RAG (Retrieval-Augmented Generation).

1. Prompt Engineering: Optimizing Model Performance Through Precise Prompts

Prompt Engineering involves designing precise prompt statements to guide the model in generating desired results. The core of this approach is to design detailed expressions based on task requirements, helping the model narrow the scope of answers and understand the context and required output format.

Suppose there is a language model whose purpose is to recommend suitable products based on user needs. In this case, if expressed as "I want to buy a high-performance phone," it may lead the model to generate insufficiently precise answers because "high-performance" can have many different interpretations, possibly referring to processing speed, camera performance, battery life, etc.

To improve recommendation accuracy, we can perform Prompt Engineering and design more specific questions or provide additional context to guide the model to better understand user needs. For example, change to the following prompt statement:

"I need a phone with long battery life and a powerful processor, priced between $500 and $800. Please recommend several phones that meet these criteria."

2. RAG: Combining External Knowledge to Enhance Generation Capabilities

RAG enhances model performance by combining external knowledge retrieval with the generation process.

In traditional generation tasks, models rely only on knowledge learned during pre-training. RAG uses a retrieval system to obtain external data in real-time and combines this data with the generation model to answer questions or generate text more accurately.

For example, in healthcare, when a model is asked about a rare disease, RAG can first retrieve relevant information from professional medical databases, then generate more accurate answers based on this information. The advantage of this method is that even if the model hasn't encountered certain data during training, it can still make high-quality responses by retrieving existing knowledge. This is particularly suitable for scenarios requiring real-time knowledge updates and can greatly expand the model's knowledge scope.

circle-check

In summary, Fine-tuning, Prompt Engineering, and RAG each have their advantages and applicable scopes. You can choose the most suitable strategy based on application scenarios and needs, without relying solely on Fine-tuning.

Prompt Engineering provides a low-cost and flexible solution by designing precise prompts to guide the model to produce high-quality results.

RAG provides a method combining external knowledge with generation capabilities, offering more accurate answers when dynamic knowledge retrieval is needed.

Fine-tuning can significantly improve model performance in specific domains but requires substantial specialized data and consumes additional resources. It should be considered a last resort when both Prompt Engineering and RAG fail to achieve success.

Last updated

Was this helpful?