How to Use Web Crawler (Data Collection) Feature

Just enter a URL, and MaiAgent can structurally crawl the text and link data from the page, allowing you to quickly select and import data into your knowledge base for faster AI assistant creation.

Feature Purpose and Value

As a business professional, you may often receive instructions from supervisors to reference or compile regulatory information from certain public websites.

If you have a technical background or engineering support, you might be able to automatically extract data by writing crawler programs. However, for non-technical personnel, manual page-by-page compilation is usually the only option, which is not only time-consuming but also risks missing key information.

In such cases, you can utilize MaiAgent's crawler feature to quickly extract website content through a No-Code approach, automatically create structured data, significantly improve information organization efficiency, and invest your time in higher-value core business activities.

How to Use the Crawler?

To create a crawler request, you can:

  1. Create a Page Crawling Request

Go to "AI Features > AI Assistant > Crawler" in the left function bar, and click the "+Create Page Crawling Request" button in the upper right corner.

  1. Enter URL

Enter the URL of the page you want to crawl and click the [Confirm] button.

  1. View Crawled Data

When the status shows complete, click "Import" on the right to view the crawled data entries.

  1. Select Data

Check the boxes on the left to select the data you want to import into the knowledge base. After selection, click the "Import" button, and the data will be automatically imported into that AI assistant's knowledge base.

To view more data entries on the same page, click "10 items/page" at the bottom right to expand the viewing range.

In the knowledge base, you can see the data presented as .md files, which can be configured with tags and metadata like regular data.

Crawler Usage Guidelines

  • Ensure you have permission to crawl the target website content

  • It's recommended to test with a small amount of data before performing large-scale crawling

  • After crawling, verify data quality using the search test function

  • Regularly update crawled data to maintain information timeliness

Last updated

Was this helpful?