# Parser Tools

### What is RAG Parser?

RAG Parser is a critical component in Retrieval-Augmented Generation (RAG) systems, responsible for parsing and breaking down raw data as a preprocessing step for embedding vectorization. It provides the foundation for subsequent vectorization and semantic retrieval, having a decisive impact on overall data quality and retrieval effectiveness.

***

### 1. Document Parser Types

MaiAgent provides four document parsers, suitable for various document formats including PDF, Word, Excel, images, and more:

| Feature               | MaiAgent Parser (Default)   | MaiAgent Parser (Online)                   | MaiAgent Parser (Offline)                                                                              | Vision Parser                                                                    |
| --------------------- | --------------------------- | ------------------------------------------ | ------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------- |
| Cost                  | Low                         | High                                       | Medium                                                                                                 | High                                                                             |
| Image Content Parsing | Cannot parse text in images | OCR only, can parse text in images         | OCR + AI understands image semantics                                                                   | AI visual understanding, best quality                                            |
| Uses LLM              | No                          | Yes                                        | Yes                                                                                                    | Yes                                                                              |
| Text Parsing Quality  | Standard                    | Good                                       | Good (excellent structure preservation)                                                                | Good                                                                             |
| Table Parsing         | Native extraction           | AI intelligently extracts text from images | Native structure preservation (best), includes static resources in images besides visual understanding | AI visual recognition generates text content from visual understanding of images |
| Parsing Speed         | Fast                        | Medium                                     | Slow                                                                                                   | Slow                                                                             |
| On-Premises (Offline) | Yes                         | No                                         | Yes, but requires VLM deployment                                                                       | No                                                                               |
| Supported Formats     | 22 types                    | 20 types                                   | 20 types (PDF only supports OCR)                                                                       | 7 types                                                                          |

***

### 2. Speech-to-Text Parser Types

MaiAgent provides four speech-to-text parsers that can transcribe audio files into text for knowledge base integration:

| Feature                | Azure Speech  | Whisper (Groq)    | Whisper (OpenAI)  | Whisper (Offline)             |
| ---------------------- | ------------- | ----------------- | ----------------- | ----------------------------- |
| Cost                   | High          | Low               | Medium            | Free                          |
| Transcription Accuracy | High          | High (large-v3)   | High (whisper-1)  | High (depends on local model) |
| Uses LLM               | No            | No                | No                | No                            |
| Parsing Speed          | Real-time     | Fastest           | Medium            | Depends on hardware           |
| On-Premises (Offline)  | No            | No                | No                | Yes                           |
| Multi-language Support | Yes           | Yes (auto-detect) | Yes (auto-detect) | Yes (auto-detect)             |
| Custom Prompts         | No            | Yes               | Yes               | Yes                           |
| VAD Voice Detection    | —             | Yes               | Yes               | Yes                           |
| Data Privacy           | Cloud (Azure) | Cloud (Groq)      | Cloud (OpenAI)    | Fully local                   |

After audio files are uploaded to the knowledge base, the system automatically performs speech-to-text parsing and provides transcript viewing and download functionality:


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.maiagent.ai/tech/maiagent-tech-en/quickstart/parser.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
