# ICAP Content Validation Integration

> This document explains how the MaiAgent platform integrates the ICAP (Internet Content Adaptation Protocol) to implement security scanning and validation of user-uploaded files and conversation content, protecting against malware and inappropriate content.

## 1. What is ICAP?

ICAP (Internet Content Adaptation Protocol) is a lightweight protocol for forwarding content to external services for processing, such as virus scanning, content filtering, and data loss prevention.

### 1.1 Core Value of ICAP

| Consideration       | No Content Validation                                         | ICAP Content Validation                                                |
| ------------------- | ------------------------------------------------------------- | ---------------------------------------------------------------------- |
| **Security**        | Users may upload malware or inappropriate content             | Automatically scans and blocks malicious files and content             |
| **Compliance**      | Difficult to meet enterprise security policy requirements     | Integrates with enterprise-grade antivirus and DLP solutions           |
| **Flexibility**     | Must develop content detection logic in-house                 | Uses a standard protocol to integrate third-party specialized services |
| **Maintainability** | Must continuously update virus signatures and detection rules | Professional vendors maintain the detection engine                     |

### 1.2 Common Use Cases

* **File Upload Security**: Documents uploaded to knowledge bases must undergo virus scanning first
* **Conversation Content Filtering**: Detect whether conversations contain inappropriate language or sensitive information
* **Data Loss Prevention (DLP)**: Prevent users from uploading files containing confidential information
* **Compliance Checks**: Ensure content complies with regulations such as GDPR and HIPAA

***

## 2. MaiAgent ICAP Integration Architecture

{% @mermaid/diagram content="flowchart TB
subgraph Client\["Client"]
User\["User"]
WebChat\["WebChat Interface"]
end

```
subgraph MaiAgent["MaiAgent Platform"]
    API["Upload API"]
    Queue["Processing Queue"]
    ICAP_Client["ICAP Client"]
    DB[("Database")]
    Storage[("File Storage")]
end

subgraph ICAP_Service["ICAP Service"]
    Scanner["Antivirus Engine"]
    DLP["DLP Engine"]
    Filter["Content Filter"]
end

User -- "Upload file" --> WebChat
WebChat -- "HTTP POST" --> API
API --> Queue
Queue --> ICAP_Client

ICAP_Client <-. "ICAP REQMOD" .-> Scanner
ICAP_Client <-. "ICAP RESPMOD" .-> DLP
ICAP_Client <-. "ICAP OPTIONS" .-> Filter

ICAP_Client -- "Validation passed" --> Storage
ICAP_Client -- "Log result" --> DB

ICAP_Client -. "Validation failed<br/>Send event" .-> WebChat
WebChat -. "Display error" .-> User" %}
```

### 2.1 Core Component Overview

* **ICAP Client**: MaiAgent's built-in ICAP protocol implementation responsible for communicating with the ICAP server
* **ICAP Server**: Third-party content scanning service, such as antivirus gateways from Symantec, McAfee, or Trend Micro
* **Processing Queue**: Asynchronously handles file uploads and scanning tasks to avoid blocking user operations
* **Scan Records**: Stores all scanning results including file hash, scan time, results, and other information

### 2.2 ICAP Request Flow

{% @mermaid/diagram content="sequenceDiagram
participant User as User
participant WebChat as WebChat
participant MaiAgent as MaiAgent API
participant ICAP as ICAP Server
participant Storage as File Storage

```
User->>WebChat: Upload file
WebChat->>MaiAgent: POST /upload
MaiAgent->>MaiAgent: Generate file ID

Note over MaiAgent,ICAP: ICAP REQMOD Request
MaiAgent->>ICAP: REQMOD /scan HTTP/1.1<br/>Encapsulated: req-hdr=0, req-body=...
ICAP->>ICAP: Scan file content

alt File is safe
    ICAP-->>MaiAgent: 200 OK (File passed)
    MaiAgent->>Storage: Store file
    Storage-->>MaiAgent: Storage successful
    MaiAgent-->>WebChat: 200 OK {fileId: ...}
    WebChat-->>User: Upload successful
else File has issues
    ICAP-->>MaiAgent: 403 Forbidden (Virus/Prohibited content)
    MaiAgent->>WebChat: Socket.IO: ICAP_BLOCKED event
    WebChat-->>User: Display error message
    MaiAgent->>MaiAgent: Log scan failure
else ICAP server error
    ICAP-->>MaiAgent: 500 Internal Server Error
    MaiAgent->>WebChat: Socket.IO: ICAP_ERROR event
    WebChat-->>User: System error, please try again later
    MaiAgent->>MaiAgent: Log error
end" %}
```

***

## 3. ICAP Protocol Implementation Details

### 3.1 ICAP Request Format

MaiAgent uses REQMOD (Request Modification) mode to send files to the ICAP server:

```http
REQMOD icap://icap-server.example.com/scan ICAP/1.0
Host: icap-server.example.com
Encapsulated: req-hdr=0, req-body=147
Allow: 204

POST /upload HTTP/1.1
Host: maiagent.ai
Content-Type: application/octet-stream
Content-Length: 1024

400
[Binary file content in chunked encoding]
0
```

**Key field descriptions**:

* **REQMOD**: Request Modification mode, used to inspect content before storage
* **Encapsulated**: Describes the encapsulation format of the HTTP request
* **Allow: 204**: Informs the ICAP server that a 204 response is acceptable if no content modification is needed
* **Chunked Encoding**: File content is sent using chunked transfer encoding

### 3.2 Chunked Transfer Encoding

MaiAgent correctly implements HTTP chunked transfer encoding:

```
Chunked transfer format:
<data size (hexadecimal)>\r\n
<data content>\r\n
...
0\r\n\r\n  (terminator)
```

**Why is correct chunked encoding important?**

* The ICAP protocol requires HTTP chunked transfer encoding for sending request and response bodies
* Incorrect encoding formats will cause the ICAP server to fail to parse file content correctly
* MaiAgent has fixed the hexadecimal format representation of chunk sizes to ensure compatibility with the ICAP standard

### 3.3 ICAP Response Handling

The ICAP server may return the following responses:

| Status Code            | Meaning                                        | MaiAgent Handling                                  |
| ---------------------- | ---------------------------------------------- | -------------------------------------------------- |
| **200 OK**             | Content has been inspected or modified         | Accept modified content (if any), store file       |
| **204 No Content**     | No content modification needed                 | Store the original file directly                   |
| **403 Forbidden**      | Content is prohibited (e.g., contains a virus) | Reject storage, send ICAP\_BLOCKED event to client |
| **500 Internal Error** | ICAP server error                              | Log error, send ICAP\_ERROR event to client        |

***

## 4. WebChat Integration

When ICAP scanning is complete, the system notifies the user of the scan results in real time:

* **File Blocked**: Displays a user-friendly error message explaining why the file was rejected (e.g., virus detected)
* **System Error**: Prompts the user to try again later and logs the error

### User Experience Optimization

* **Instant Feedback**: Displays a "Scanning" status immediately after file upload
* **Progress Indicator**: Shows a progress bar during large file uploads
* **Friendly Error Messages**: Uses clear language to explain why a file was rejected
* **Retry Mechanism**: Provides a re-upload option for temporary errors

***

## 5. Scan Record Management

### 5.1 Record Retention

MaiAgent stores detailed records of all ICAP scans:

Each scan record contains the following information:

| Field             | Description                                        |
| ----------------- | -------------------------------------------------- |
| File ID           | Unique identifier of the scanned file              |
| Filename          | Original filename                                  |
| File Hash         | SHA-256 hash for verifying file integrity          |
| Scan Result       | PASS, BLOCKED, or ERROR                            |
| Scan Reason       | Specific reason when a file is blocked             |
| ICAP Server       | Address of the ICAP server that performed the scan |
| Scan Time         | Timestamp of when the scan was executed            |
| User/Organization | User and organization that initiated the scan      |

### 5.2 Expired Record Cleanup

MaiAgent implements an automatic cleanup mechanism that periodically deletes expired scan records:

The system uses background scheduled tasks to automatically clean up scan records that exceed the retention period.

**Cleanup strategy**:

* **Retention Period**: Scan records are retained for 90 days by default
* **Execution Frequency**: Cleanup task runs once per week
* **Audit Retention**: Critical security events (such as virus detections) can be configured with longer retention periods

***

## 6. Configuration and Deployment

### 6.1 ICAP Server Configuration

Administrators need to configure ICAP server information in the MaiAgent admin panel:

The following ICAP parameters can be configured in the admin panel:

| Setting                  | Description                                             |
| ------------------------ | ------------------------------------------------------- |
| ICAP Server Address      | Connection URL for the ICAP service                     |
| Timeout                  | Scan timeout in seconds                                 |
| Retry Count and Interval | Retry strategy for failed scans                         |
| Scan Scope               | Choose to scan file uploads and/or conversation content |
| File Size Limit          | Files exceeding this size will not be scanned           |
| File Extension Whitelist | Specify file types that do not require scanning         |

### 6.2 Supported ICAP Services

MaiAgent is compatible with the following mainstream ICAP services:

* **Symantec Protection Engine**: Enterprise-grade antivirus and content filtering
* **McAfee Web Gateway**: Gateway-level content scanning
* **Trend Micro IWSVA**: Integrated web security and content scanning
* **Kaspersky Scan Engine**: Kaspersky's ICAP scanning engine
* **ClamAV (via c-icap)**: Open-source antivirus solution

### 6.3 Performance Considerations

To ensure system performance, the following recommendations apply:

* **Asynchronous Scanning**: Use background tasks for scanning large files to avoid blocking upload responses
* **Caching**: Cache scan results for identical files (same hash value)
* **ICAP Connection Pooling**: Maintain persistent connections with the ICAP server to reduce connection overhead
* **Load Balancing**: Use multiple ICAP server instances to distribute scanning load

***

## 7. Technical Advantages of MaiAgent's ICAP Integration

### 7.1 Security Advantages

* **Multi-layered Protection**: Files are scanned before storage to prevent malicious content
* **Real-time Blocking**: Threats are blocked immediately and never stored in the system
* **Professional Engines**: Integrates industry-leading antivirus and DLP engines
* **Complete Records**: All scanning activities are recorded in detail for security auditing

### 7.2 Compliance Advantages

* **Standard Protocol**: Uses the industry-standard ICAP protocol for easy integration with existing enterprise security infrastructure
* **Audit Trail**: Complete scan records meet compliance requirements
* **Flexible Configuration**: Scanning strategies can be adjusted based on different industry compliance needs
* **Data Protection**: Supports DLP engines to prevent sensitive data leakage

### 7.3 Operations Advantages

* **Decoupled Architecture**: ICAP scanning logic is decoupled from the application, making maintenance easier
* **Horizontal Scaling**: Easily add ICAP servers to handle more scan requests
* **Unified Management**: Enterprises can use their existing ICAP infrastructure to centrally manage content scanning across all applications
* **Detailed Logging**: Comprehensive scan logs for troubleshooting and performance optimization

***

## 8. Related Documentation

* [AI Safety Guardrails](/tech/maiagent-tech-en/advanced-genai-tech/guardrails.md) - Learn about MaiAgent's multi-layered security protection strategies
* [File Upload and Knowledge Base Management](https://docs.maiagent.ai/km/km-basic-settings)

### Reference Links

* [ICAP RFC 3507](https://datatracker.ietf.org/doc/html/rfc3507)
* [Chunked Transfer Encoding (RFC 7230)](https://datatracker.ietf.org/doc/html/rfc7230#section-4.1)
* [Open ICAP Forum](https://www.icap-forum.org/)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.maiagent.ai/tech/maiagent-tech-en/advanced-genai-tech/icap-content-validation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
