# Parser Tools

### What is RAG Parser?

RAG Parser is a critical component in Retrieval-Augmented Generation (RAG) systems, responsible for parsing and breaking down raw data as a preprocessing step for embedding vectorization. It provides the foundation for subsequent vectorization and semantic retrieval, having a decisive impact on overall data quality and retrieval effectiveness.

***

### 1. Document Parser Types

MaiAgent provides four document parsers, suitable for various document formats including PDF, Word, Excel, images, and more:

| Feature               | MaiAgent Parser (Default)   | MaiAgent Parser (Online)                   | MaiAgent Parser (Offline)                                                                              | Vision Parser                                                                    |
| --------------------- | --------------------------- | ------------------------------------------ | ------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------- |
| Cost                  | Low                         | High                                       | Medium                                                                                                 | High                                                                             |
| Image Content Parsing | Cannot parse text in images | OCR only, can parse text in images         | OCR + AI understands image semantics                                                                   | AI visual understanding, best quality                                            |
| Uses LLM              | No                          | Yes                                        | Yes                                                                                                    | Yes                                                                              |
| Text Parsing Quality  | Standard                    | Good                                       | Good (excellent structure preservation)                                                                | Good                                                                             |
| Table Parsing         | Native extraction           | AI intelligently extracts text from images | Native structure preservation (best), includes static resources in images besides visual understanding | AI visual recognition generates text content from visual understanding of images |
| Parsing Speed         | Fast                        | Medium                                     | Slow                                                                                                   | Slow                                                                             |
| On-Premises (Offline) | Yes                         | No                                         | Yes, but requires VLM deployment                                                                       | No                                                                               |
| Supported Formats     | 22 types                    | 20 types                                   | 20 types (PDF only supports OCR)                                                                       | 7 types                                                                          |

***

### 2. Speech-to-Text Parser Types

MaiAgent provides four speech-to-text parsers that can transcribe audio files into text for knowledge base integration:

| Feature                | Azure Speech  | Whisper (Groq)    | Whisper (OpenAI)  | Whisper (Offline)             |
| ---------------------- | ------------- | ----------------- | ----------------- | ----------------------------- |
| Cost                   | High          | Low               | Medium            | Free                          |
| Transcription Accuracy | High          | High (large-v3)   | High (whisper-1)  | High (depends on local model) |
| Uses LLM               | No            | No                | No                | No                            |
| Parsing Speed          | Real-time     | Fastest           | Medium            | Depends on hardware           |
| On-Premises (Offline)  | No            | No                | No                | Yes                           |
| Multi-language Support | Yes           | Yes (auto-detect) | Yes (auto-detect) | Yes (auto-detect)             |
| Custom Prompts         | No            | Yes               | Yes               | Yes                           |
| VAD Voice Detection    | —             | Yes               | Yes               | Yes                           |
| Data Privacy           | Cloud (Azure) | Cloud (Groq)      | Cloud (OpenAI)    | Fully local                   |

After audio files are uploaded to the knowledge base, the system automatically performs speech-to-text parsing and provides transcript viewing and download functionality:
