# LiveKit Voice Agent Integration Architecture > This document explains how the MaiAgent platform integrates LiveKit to implement real-time voice call functionality, including WebRTC connection management, Socket.IO namespace isolation, and audio data streaming. ## 1. What is LiveKit? LiveKit is an open-source real-time communication infrastructure that provides low-latency, high-quality audio and video communication capabilities. MaiAgent uses LiveKit to implement its AI Voice Agent feature, allowing users to have natural voice conversations with AI assistants. ### 1.1 Core Technical Advantages | Aspect | Traditional Phone Systems | LiveKit + MaiAgent | | ----------------- | ------------------------------------------- | ----------------------------------------------------------------- | | **Latency** | 200-500ms | < 100ms, approaching real human conversation experience | | **Audio Quality** | Limited by telephone network encoding | Supports Opus high-quality encoding with adjustable bitrate | | **Scalability** | Requires dedicated voice gateways | Based on WebRTC, easily scalable horizontally | | **Cost** | Requires phone system rental fees | Internet-based, lower cost | | **Integration** | Difficult to integrate with digital systems | Natively integrates with web applications, embeddable in any page | ### 1.2 Common Use Cases * **AI Voice Customer Service**: Customers have voice conversations with AI assistants directly through web pages or apps * **Voice Assistants**: Internal enterprise use of voice commands to operate systems, such as querying data or creating tickets * **Remote Assistance**: AI assistants guide users through operations via voice, such as equipment installation or troubleshooting * **Multilingual Support**: Combining speech recognition and translation for real-time cross-language communication *** ## 2. MaiAgent LiveKit Integration Architecture {% @mermaid/diagram content="flowchart TB subgraph Client\["Browser/Client"] Web\["Web UI"] WebRTC\["WebRTC SDK"] end ``` subgraph MaiAgent["MaiAgent Platform"] Django["Django Backend"] SIO_Main["Socket.IO Server
(Main Namespace)"] SIO_Voice["Socket.IO Server
(Voice Agent Dedicated)"] VoiceService["Voice Agent Service"] end subgraph LiveKit["LiveKit Infrastructure"] SFU["LiveKit SFU
(Selective Forwarding Unit)"] Room["Virtual Room"] end subgraph AI["AI Processing Layer"] STT["Speech Recognition
(Speech-to-Text)"] LLM["Language Model"] TTS["Speech Synthesis
(Text-to-Speech)"] end Web <--> WebRTC WebRTC <-. "WebRTC Media Stream" .-> SFU Web <-. "Socket.IO Control" .-> SIO_Voice SIO_Voice <--> VoiceService VoiceService <--> SFU VoiceService --> STT STT --> LLM LLM --> TTS TTS -.-> SFU Django --> SIO_Main Django --> SIO_Voice" %} ``` ### 2.1 Core Component Overview * **LiveKit SFU (Selective Forwarding Unit)**: Handles audio/video data routing and forwarding, supporting multiple simultaneous calls * **Socket.IO Server**: Handles control signals such as call start/end and state synchronization * **Voice Agent Service**: Coordinates the complete flow of speech recognition, LLM inference, and speech synthesis * **WebRTC**: Establishes peer-to-peer media connections in the browser for audio data transmission ### 2.2 Connection Establishment Flow {% @mermaid/diagram content="sequenceDiagram participant User as User participant Browser as Browser participant Django as Backend participant SocketIO as Voice Agent Socket participant LiveKit as LiveKit SFU ``` User->>Browser: Click "Start Call" Browser->>Django: Request to create voice session Django->>LiveKit: Create virtual room LiveKit-->>Django: Return room information and Token Django-->>Browser: Return connection parameters Browser->>SocketIO: Establish Socket.IO connection SocketIO-->>Browser: Connection confirmed Browser->>LiveKit: WebRTC connection request LiveKit-->>Browser: ICE Candidates exchange Browser->>LiveKit: Establish DTLS/SRTP encrypted channel LiveKit-->>Browser: Encrypted channel established Note over Browser,LiveKit: Audio streaming begins Browser->>SocketIO: Send PCM audio data SocketIO->>LiveKit: Forward audio to AI processing" %} ``` *** ## 3. Socket.IO Namespace Isolation ### 3.1 Why a Dedicated Namespace? MaiAgent's original Socket.IO was primarily used for real-time delivery of general chat messages. Voice call requirements have the following special characteristics: * **High-frequency Data Transmission**: Audio data must be transmitted continuously at 20-60ms intervals * **Low Latency Requirements**: Any delay noticeably affects the call experience * **Independent Error Handling**: Voice connection failures should not affect general chat functionality * **Different CORS Policies**: Voice features may need to allow more origin domains ### 3.2 Namespace Isolation Architecture MaiAgent has created a dedicated Socket.IO server for the Voice Agent, completely separated from the general chat message Socket.IO. The Voice Agent uses a dedicated namespace with an independently configured CORS policy. **Benefits of isolation**: 1. **Performance Isolation**: High-frequency voice data transmission does not affect general chat message processing 2. **Independent Scaling**: Server resources can be added specifically for voice traffic 3. **Enhanced Security**: Different features use different CORS policies and authentication mechanisms 4. **Fault Isolation**: Voice Agent issues do not affect other features ### 3.3 Cross-Origin Resource Sharing (CORS) Configuration The Voice Agent's CORS settings are independent from general chat functionality, allowing only authorized origin domains to connect, ensuring the security of voice calls. *** ## 4. Audio Data Streaming ### 4.1 Audio Data Format Audio data transmitted via LiveKit uses the following specifications: * **Format**: PCM (Pulse-Code Modulation) * **Sample Rate**: 16kHz or 48kHz * **Bit Depth**: 16-bit * **Channels**: Mono * **Data Transmission**: Base64 encoded and transmitted via Socket.IO ### 4.2 audioData Event Handling {% @mermaid/diagram content="sequenceDiagram participant Browser as Browser participant SocketIO as Socket.IO Server participant Queue as Audio Queue participant STT as Speech Recognition ``` loop Continuous Transmission Browser->>SocketIO: Send audio data Note over SocketIO: Priority processing, bypasses general message queue SocketIO->>Queue: Add to voice processing queue end Queue->>STT: Batch process audio segments STT-->>Queue: Return recognized text Queue->>SocketIO: Return recognition results SocketIO->>Browser: Display real-time captions" %} ``` **Audio Priority Processing Mechanism**: MaiAgent has optimized audio data transmission to ensure audio data: * **Does not enter the general message queue**: Avoids being blocked by other messages * **Has priority processing**: Audio data can jump the queue to reduce latency * **Has independent routing**: Forwarded directly to the voice processing module ### 4.3 Disconnection Handling MaiAgent implements graceful disconnection handling: {% @mermaid/diagram content="flowchart TD A\["Disconnection detected"] --> B{"Disconnect reason?"} B -- "User hung up" --> C\["Normal resource cleanup"] B -- "Network issue" --> D\["Attempt reconnection"] B -- "Server error" --> E\["Log error"] ``` C --> F["Release LiveKit room"] D --> G{"Reconnection successful?"} E --> F G -- "Yes" --> H["Resume call"] G -- "No" --> I["Notify user
and clean up resources"] F --> J["Close Socket.IO connection"] H --> K["Continue call"] I --> J J --> L["Complete"]" %} ``` **Disconnection handling mechanisms**: * **Heartbeat Detection**: Periodic ping/pong messages to check connection status * **Automatic Reconnection**: Automatic reconnection attempts during brief disconnections without user intervention * **Resource Cleanup**: Ensures LiveKit rooms and server resources are properly released * **State Synchronization**: Restores conversation context after reconnection *** ## 5. Detailed Logging MaiAgent implements comprehensive logging mechanisms for debugging and monitoring: ### 5.1 Log Types | Log Type | Recorded Content | Purpose | | -------------------- | ----------------------------------------------- | ------------------------ | | **Connection Logs** | Socket.IO connect/disconnect events, source IP | User session tracking | | **Audio Logs** | Audio data transmission frequency, data size | Call quality analysis | | **Error Logs** | Exception stack traces, error messages, context | Problem investigation | | **Performance Logs** | Processing latency, queue length, CPU usage | Performance optimization | ### 5.2 Log Examples ```log [2025-12-23 14:30:15] INFO - Voice Agent Socket.IO Configuration: - Namespace: /voice-agent - Allowed Origins: ['https://maiagent.ai', ...] - Max Connections: 1000 - Ping Timeout: 60s [2025-12-23 14:30:45] INFO - Client connected: - Socket ID: a1b2c3d4e5f6 - User ID: user_12345 - Origin: https://admin.maiagent.ai [2025-12-23 14:31:02] DEBUG - Audio data received: - Size: 3200 bytes - Format: PCM 16kHz mono - Latency: 45ms [2025-12-23 14:32:15] WARNING - Connection lost: - Socket ID: a1b2c3d4e5f6 - Reason: Network timeout - Duration: 90s - Attempting reconnection... ``` *** ## 6. Technical Advantages of MaiAgent's LiveKit Integration ### 6.1 Performance Advantages * **Low-latency Communication**: WebRTC peer-to-peer transmission with latency as low as under 100ms * **High Audio Quality**: Supports Opus encoding with dynamic bitrate adjustment based on network conditions * **Namespace Isolation**: Voice and general messages are processed separately, avoiding interference * **Audio Queue Optimization**: The ignore\_queue parameter ensures audio data is processed with priority ### 6.2 Reliability Advantages * **Automatic Reconnection**: Automatic connection recovery during network fluctuations * **Resource Cleanup**: Ensures LiveKit rooms are properly released on disconnection * **Error Isolation**: Voice Agent issues do not affect other features * **Detailed Logging**: Comprehensive log records for easy issue tracking ### 6.3 Scalability Advantages * **Horizontal Scaling**: Easily add LiveKit SFU nodes to handle more concurrent calls * **Distributed Architecture**: Socket.IO and LiveKit can be deployed on different servers * **Load Balancing**: Supports multiple Socket.IO instances behind a load balancer ### 6.4 Security Advantages * **Token Authentication**: Every LiveKit connection requires a time-limited JWT Token * **End-to-end Encryption**: WebRTC uses DTLS/SRTP encrypted transmission * **CORS Protection**: Strict control over allowed connection origins * **Connection Tracking**: Records the source and user information for all connections *** ## 7. Related Documentation * [Voice Customer Service](https://docs.maiagent.ai/application/voicecs) - User guide * [IVR Customer Service Intent Recognition](https://docs.maiagent.ai/application/voicecs/ivr-ke-fu-yi-tu-bian-shi) * [Voice Call Summary](https://docs.maiagent.ai/application/voicecs/yu-yin-tong-hua-zhai-yao) ### Reference Links * [LiveKit Documentation](https://docs.livekit.io/) * [WebRTC Specification](https://www.w3.org/TR/webrtc/) * [Socket.IO Documentation](https://socket.io/docs/v4/) * [Opus Codec](https://opus-codec.org/) --- # Agent Instructions: Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://docs.maiagent.ai/tech/maiagent-tech-en/advanced-genai-tech/livekit-voice-agent.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.