> For the complete documentation index, see [llms.txt](https://docs.maiagent.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.maiagent.ai/agent-builder/voice-agent.md).

# 語音助理

## 這是什麼？ <a href="#what-is-this" id="what-is-this"></a>

語音助理讓 AI 助理能透過**語音**互動：使用者用講的，AI 也用講的回。底層整合即時語音模型、STT（語音轉文字）、TTS（文字轉語音）與中斷控制，可用於電話客服、IVR、語音助理等情境。

啟用後，AI 助理會多出一個「語音通話」介面，使用者點麥克風即可開始對話。

## 三種互動模式 <a href="#interaction-modes" id="interaction-modes"></a>

不同模式適合不同需求，取決於延遲、聲線自訂彈性、工具支援度。

| 模式                  | 怎麼運作                     | 適合的場景                   |
| ------------------- | ------------------------ | ----------------------- |
| **即時對話（Realtime）**  | 使用語音模型直接做即時語音對話          | 對延遲要求最高、能接受預設聲線的場景      |
| **即時對話 + TTS**      | 即時語音模型搭配自訂 TTS 輸出        | 需要自訂品牌聲線、又想保留即時感        |
| **STT + LLM + TTS** | 語音轉文字 → LLM → 文字轉語音的傳統管線 | 需要完整工具支援、可接受稍高延遲、希望最大彈性 |

**簡單判斷**：要最快、最自然 → Realtime；要品牌聲線 → Realtime + TTS；要最大彈性與工具支援 → STT + LLM + TTS。

## 中斷控制（Turn Handling） <a href="#turn-handling" id="turn-handling"></a>

語音對話的關鍵體驗 — 使用者能不能在 AI 講話途中打斷它？怎麼判斷使用者是真的在說話、而不是雜音或嗯啊？

可調整的參數：

| 參數              | 說明                         |
| --------------- | -------------------------- |
| **最短語音持續時間（秒）** | 使用者的語音必須持續至少幾秒，才會被當成「我要打斷」 |
| **最少字數**        | 使用者必須說出至少幾個字，才會觸發中斷        |

> **注意**：`min_duration` 和 `min_words` 僅在 **STT + LLM + TTS** 模式有效。Realtime 模式由語音模型內部處理中斷判斷。

## 對話狀態 <a href="#conversation-states" id="conversation-states"></a>

語音對話進行時，介面會顯示當下的狀態：

* **聆聽中**：AI 正在接收使用者的語音
* **思考中**：AI 正在處理（查知識庫、用工具、生成回覆）
* **回覆中**：AI 正在用語音回覆
* **初始化中**：剛建立連線、準備中

## 跟一般文字對話有什麼不同？ <a href="#voice-vs-text" id="voice-vs-text"></a>

|          | 文字助理              | 語音助理               |
| -------- | ----------------- | ------------------ |
| **輸入**   | 鍵盤打字、貼上檔案         | 麥克風語音、可能加上 DTMF 按鍵 |
| **輸出**   | 文字、Markdown、圖片、檔案 | 語音、最後再附上文字逐字稿      |
| **延遲要求** | 秒級可接受             | 必須毫秒級才自然           |
| **適合場景** | 需要詳細資訊、會回看記錄      | 需要立即回應、雙手忙、電話通路    |

## 適合的場景 <a href="#when-to-use" id="when-to-use"></a>

* **電話客服**：取代傳統 IVR，AI 直接接電話、聽問題、給答案
* **語音查詢系統**：客戶打電話查訂單、查餘額、查保單
* **語音 FAQ**：常見問題用講的直接問
* **行車 / 雙手忙場景**：使用者沒辦法打字，但需要 AI 協助
* **無障礙需求**：對打字困難的使用者更友善

## 使用限制 <a href="#limitations" id="limitations"></a>

* **工具支援**：目前 **Realtime** 與 **Realtime + TTS** 模式僅支援 **MCP 工具**；要用 API 工具的話請改用 STT + LLM + TTS 模式
* **知識庫搜尋**：語音助理會搜尋所有關聯的知識庫，**無法限定只搜某幾份文件**
* **麥克風權限**：需要使用者授權瀏覽器使用麥克風才能開始對話

## 我需要做什麼？ <a href="#what-do-i-need-to-do" id="what-do-i-need-to-do"></a>

1. **進入 AI 助理設定** — 在你想開啟語音的 Agent 設定頁
2. **選擇語音助理模式** — Realtime / Realtime + TTS / STT + LLM + TTS 三選一
3. **設定供應商與配置** — 依模式選對應的 STT / TTS / Realtime 供應商與 JSON 配置
4. **調整中斷控制**（選用）— STT + LLM + TTS 模式可調最短時長與最少字數
5. **測試** — 在介面開啟語音通話，確認連線、聆聽、回覆三個狀態都順暢

## 延伸閱讀 <a href="#further-reading" id="further-reading"></a>

* [語音客服情境總覽](/application/voicecs.md)
* [IVR 客服意圖辨識](/application/voicecs/ivr-intent-recognition.md)
* [語音通話摘要](/application/voicecs/call-summary.md)
* [語音通話品質檢驗](/application/voicecs/call-qa.md)


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.maiagent.ai/agent-builder/voice-agent.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.