Large Language Models (LLM)

Key Selection Criteria

When choosing a large language model, consider the following key factors:

  1. Environment: Whether the usage environment has internet access determines the choice between cloud-based or local models.

  2. Quality: The model's ability to generate responses and follow instructions.

  3. Speed: Text generation speed and latency requirements to ensure model response time.

  4. Pricing: Consider the model's usage costs based on requirements. (No need to consider model pricing on MaiAgent)

  5. Others: Whether it supports multimodal capabilities, function calling support.

Large Language Model Analysis - Artificial Analysis

Large Language Models Supported on MaiAgent

Cloud Models (Closed Source)

Model Name
Description
Is Reasoning Model
Use Cases

o4-mini

Faster than o3-mini-high, slightly lower quality than o3-mini-high

Yes

Choice for high quality and speed

o3-mini-high

High quality, medium speed, uses chain-of-thought reasoning for more complete, precise answers

Yes

High difficulty tasks requiring deep reasoning and creativity

o3-mini-medium

Fast speed, medium quality

Yes

Most business applications, simple creation or regular Q&A

o3-mini-low

Fastest speed, basic quality, lacks deep reasoning

Yes

Suitable for quick, simple tasks not requiring depth

o1-mini 2024-09-12

o1 series trained through reinforcement learning for complex reasoning. Thinks before answering, generating long internal thought chains. Slowest speed, high quality.

Yes

Very difficult problems when other LLMs fail

GPT-4o 2024-08-06

High quality and speed overall

No

Slightly lower instruction following and logic than Claude 3.5 Sonnet but faster. A common choice👍

GPT-4o mini 2024-07-18

Fast speed, medium quality. Slightly lower quality than Gemini 2.0 Flash

No

Alternative for simple tasks when Gemini 2.0 Flash unavailable

Claude 4 Sonnet

Medium-slow speed, strong structured data generation/extraction, especially good at tool calling; better logic reasoning and coding than Claude 3.7 Sonnet, further reduced hallucination.

Hybrid reasoning model

First choice for Agent mode👍 Suitable for high complexity tasks, professional domains and long conversations.

Claude 3.7 Sonnet

Medium-slow speed, excels at structured data generation, stronger logical reasoning than Claude 3.5 Sonnet. Low hallucination rate

Hybrid reasoning model

First choice for most cases👍 Suitable for complex tasks, professional domains, long conversations

Claude 3.5 Sonnet

Follows role instructions, weaker logical reasoning than Claude 3.7 Sonnet but faster speed. Low hallucination rate

No

Switch to Gemini 2.0 Flash if speed is too slow

Gemini 2.5 Pro

Better quality than Claude 3.7 Sonnet for longer conversations and code generation, but slightly weaker in Agent mode and tool calling

No

Can be used interchangeably with Claude 3.7 Sonnet

Gemini 2.0 Pro

Similar quality to Claude 3.5 Sonnet but slower

No

Alternative to Claude 3.5 Sonnet

Gemini 2.5 Flash

Fast speed, good multimodal capabilities

No

Gemini 2.0 Flash

Fast speed, medium quality

No

First choice for simple tasks👍

DeepSeek V3

Fast speed, high quality

Yes

Suitable for document retrieval and large database query tasks

DeepSeek R1 Distill Llama 70B

High response quality, medium speed (slower than DeepSeek V3)

Yes

Suitable for tasks requiring multi-step reasoning and background knowledge

DeepSeek R1

Slower response speed, but strong Chinese comprehension, high quality responses. Deep thinking and adjusts accurately to role instructions

Yes

For complex multi-turn Chinese conversations. Handling complex role instructions👍

Local Models (Open Source)

Below is a comparison table of mainstream open source models. For hardware requirements of open source models, please refer to the GPU chapter.

Model Name
Description
Use Cases

Meta Llama3.3 70B

High quality, medium speed

Data analysis, content creation

Meta Llama3.3 70B instruct (M2Ultra)

High quality, fast speed

Voice customer service

Meta Llama3.2 90B

Very high quality, medium speed

Professional Q&A, high precision tasks

Llama3-TAIDE-LX-70B-Chat (NCHC)

High quality, strong Chinese generation, medium speed

Customer service Q&A, knowledge Q&A

TAIDE-LX-70B-Chat (NCHC)

High quality, medium speed

Customer service Q&A, knowledge Q&A

Mistral Large (24.07)

Medium quality, lacks deep reasoning, fast speed

Customer service Q&A, simple text generation

Meta-Llama 3.1-70B

Medium quality, medium compute requirements

Customer service, knowledge Q&A, advanced translation and summarization

Meta-Llama 3.1-8B

Acceptable quality, low compute requirements

Translation, summarization

Mistral Large 2

High quality, high hardware requirements

Customer service, knowledge Q&A, advanced translation and summarization

Mistral 8x7B

Low quality, fastest speed

Translation, summarization

Gemma3 27B (M2 Ultra)

High quality, high hardware requirements

Professional knowledge Q&A, data analysis, complex content generation

Do Models Always Need Fine-tuning?

With the rapid development of artificial intelligence technology, language models have acquired powerful language understanding and generation capabilities, and are widely used in many fields. For example, pre-trained language models can easily handle daily conversations, article generation, and simple Q&A tasks.

However, when models face more challenging professional domain tasks, such as medical, legal, or technical support, relying solely on pre-trained models may not provide optimal performance. Some developers choose to perform Fine-tuning, which involves additional training for specific domains to improve the model's professional knowledge.

However, Fine-tuning is not the only solution. There are two other effective methods to achieve this goal: Prompt Engineering and RAG (Retrieval-Augmented Generation).

1. Prompt Engineering: Optimizing Model Performance Through Precise Prompts

Prompt Engineering involves designing precise prompt statements to guide the model in generating desired results. The core of this method is to design detailed expressions based on task requirements, helping the model narrow down the answer scope, understand context, and required output format.

Suppose we have a language model aimed at recommending suitable products based on user needs. In this case, if expressed as "I want to buy a high-performance phone," the model might generate imprecise answers because "high-performance" can have many different interpretations, possibly referring to processing speed, camera performance, battery life, etc.

To improve recommendation accuracy, we can perform Prompt Engineering by designing more specific questions or providing additional context to help the model better understand user needs. For example, changing to the following prompt:

"I need a phone with long battery life and a high-efficiency processor, priced between $500 to $800, please recommend several phones that meet these criteria."

2. RAG: Enhancing Generation Capabilities by Combining External Knowledge

RAG improves model performance by combining external knowledge retrieval with the generation process.

In traditional generation tasks, models rely only on knowledge learned during pre-training. RAG, however, uses a retrieval system to obtain external data in real-time and combines this data with the generation model to provide more accurate answers to questions or generate text.

For example, in the medical field, when a model is asked about a rare disease, RAG can first retrieve relevant information from professional medical databases, then generate more accurate answers based on this information. The advantage of this method is that even if the model hasn't encountered certain information during training, it can still provide high-quality responses by retrieving existing knowledge. This is particularly suitable for scenarios requiring real-time knowledge updates and can greatly expand the model's knowledge range.

In conclusion, Fine-tuning, Prompt Engineering, and RAG each have their advantages and applicable scenarios. The most suitable strategy can be chosen based on application scenarios and requirements, rather than relying solely on Fine-tuning.

Prompt Engineering provides a low-cost and flexible solution by designing precise prompts to guide models in producing high-quality results.

RAG provides a method combining external knowledge with generation capabilities, offering more precise answers when dynamic knowledge acquisition is needed.

Fine-tuning can significantly improve model performance in specific domains but requires large amounts of professional data and consumes additional resources. It should be considered as a last resort when both Prompt Engineering and RAG approaches fail.

Last updated

Was this helpful?