> For the complete documentation index, see [llms.txt](https://docs.maiagent.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.maiagent.ai/agent-builder/crawler.md). # 爬蟲

## 這是什麼？爬蟲能自動從網頁抓取內容，並匯入到知識庫中。你只要給它一個網址，它就能爬取該頁面（或整個網站）的文字內容，轉換成 Agent 能查閱的知識。 ## 什麼時候需要用？ * **官網內容**：把公司官網的產品介紹、服務說明自動匯入 * **公開資訊**：抓取法規、公告、技術文件等公開網頁 * **持續更新**：網站內容會變動，定期重新爬取保持知識庫最新 ## 爬蟲和知識庫的關係爬蟲是知識庫的**資料來源之一**。它的定位是： ``` 資料來源知識庫 Agent ───────── ────── ────── 手動上傳文件 ──→ 建立 FAQ ──→ 企業知識百科 ──→ 查閱並回覆爬蟲抓網頁 ──→ ``` 爬蟲抓到的內容最終會進入知識庫，Agent 不會直接跟爬蟲互動。 ## 我需要做什麼？ 1. **輸入網址** — 告訴爬蟲要抓哪個網頁或網站 2. **設定範圍** — 只抓單頁、還是遞迴抓取子頁面 3. **執行爬取** — 啟動爬蟲，等待完成 4. **匯入知識庫** — 將爬取結果匯入指定的知識庫 5. **驗證內容** — 檢查匯入的內容是否正確完整 ## 延伸閱讀 * [如何使用爬蟲（爬取資料）功能](/km/scrape-website.md) --- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://docs.maiagent.ai/agent-builder/crawler.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.