# ICAP Content Validation Integration

> This document explains how the MaiAgent platform integrates the ICAP (Internet Content Adaptation Protocol) to implement security scanning and validation of user-uploaded files and conversation content, protecting against malware and inappropriate content.

## 1. What is ICAP?

ICAP (Internet Content Adaptation Protocol) is a lightweight protocol for forwarding content to external services for processing, such as virus scanning, content filtering, and data loss prevention.

### 1.1 Core Value of ICAP

| Consideration       | No Content Validation                                         | ICAP Content Validation                                                |
| ------------------- | ------------------------------------------------------------- | ---------------------------------------------------------------------- |
| **Security**        | Users may upload malware or inappropriate content             | Automatically scans and blocks malicious files and content             |
| **Compliance**      | Difficult to meet enterprise security policy requirements     | Integrates with enterprise-grade antivirus and DLP solutions           |
| **Flexibility**     | Must develop content detection logic in-house                 | Uses a standard protocol to integrate third-party specialized services |
| **Maintainability** | Must continuously update virus signatures and detection rules | Professional vendors maintain the detection engine                     |

### 1.2 Common Use Cases

* **File Upload Security**: Documents uploaded to knowledge bases must undergo virus scanning first
* **Conversation Content Filtering**: Detect whether conversations contain inappropriate language or sensitive information
* **Data Loss Prevention (DLP)**: Prevent users from uploading files containing confidential information
* **Compliance Checks**: Ensure content complies with regulations such as GDPR and HIPAA

***

## 2. MaiAgent ICAP Integration Architecture

{% @mermaid/diagram content="flowchart TB
subgraph Client\["Client"]
User\["User"]
WebChat\["WebChat Interface"]
end

```
subgraph MaiAgent["MaiAgent Platform"]
    API["Upload API"]
    Queue["Processing Queue"]
    ICAP_Client["ICAP Client"]
    DB[("Database")]
    Storage[("File Storage")]
end

subgraph ICAP_Service["ICAP Service"]
    Scanner["Antivirus Engine"]
    DLP["DLP Engine"]
    Filter["Content Filter"]
end

User -- "Upload file" --> WebChat
WebChat -- "HTTP POST" --> API
API --> Queue
Queue --> ICAP_Client

ICAP_Client <-. "ICAP REQMOD" .-> Scanner
ICAP_Client <-. "ICAP RESPMOD" .-> DLP
ICAP_Client <-. "ICAP OPTIONS" .-> Filter

ICAP_Client -- "Validation passed" --> Storage
ICAP_Client -- "Log result" --> DB

ICAP_Client -. "Validation failed<br/>Send event" .-> WebChat
WebChat -. "Display error" .-> User" %}
```

### 2.1 Core Component Overview

* **ICAP Client**: MaiAgent's built-in ICAP protocol implementation responsible for communicating with the ICAP server
* **ICAP Server**: Third-party content scanning service, such as antivirus gateways from Symantec, McAfee, or Trend Micro
* **Processing Queue**: Asynchronously handles file uploads and scanning tasks to avoid blocking user operations
* **Scan Records**: Stores all scanning results including file hash, scan time, results, and other information

### 2.2 ICAP Request Flow

{% @mermaid/diagram content="sequenceDiagram
participant User as User
participant WebChat as WebChat
participant MaiAgent as MaiAgent API
participant ICAP as ICAP Server
participant Storage as File Storage

```
User->>WebChat: Upload file
WebChat->>MaiAgent: POST /upload
MaiAgent->>MaiAgent: Generate file ID

Note over MaiAgent,ICAP: ICAP REQMOD Request
MaiAgent->>ICAP: REQMOD /scan HTTP/1.1<br/>Encapsulated: req-hdr=0, req-body=...
ICAP->>ICAP: Scan file content

alt File is safe
    ICAP-->>MaiAgent: 200 OK (File passed)
    MaiAgent->>Storage: Store file
    Storage-->>MaiAgent: Storage successful
    MaiAgent-->>WebChat: 200 OK {fileId: ...}
    WebChat-->>User: Upload successful
else File has issues
    ICAP-->>MaiAgent: 403 Forbidden (Virus/Prohibited content)
    MaiAgent->>WebChat: Socket.IO: ICAP_BLOCKED event
    WebChat-->>User: Display error message
    MaiAgent->>MaiAgent: Log scan failure
else ICAP server error
    ICAP-->>MaiAgent: 500 Internal Server Error
    MaiAgent->>WebChat: Socket.IO: ICAP_ERROR event
    WebChat-->>User: System error, please try again later
    MaiAgent->>MaiAgent: Log error
end" %}
```

***

## 3. ICAP Protocol Implementation Details

### 3.1 ICAP Request Format

MaiAgent uses REQMOD (Request Modification) mode to send files to the ICAP server:

```http
REQMOD icap://icap-server.example.com/scan ICAP/1.0
Host: icap-server.example.com
Encapsulated: req-hdr=0, req-body=147
Allow: 204

POST /upload HTTP/1.1
Host: maiagent.ai
Content-Type: application/octet-stream
Content-Length: 1024

400
[Binary file content in chunked encoding]
0
```

**Key field descriptions**:

* **REQMOD**: Request Modification mode, used to inspect content before storage
* **Encapsulated**: Describes the encapsulation format of the HTTP request
* **Allow: 204**: Informs the ICAP server that a 204 response is acceptable if no content modification is needed
* **Chunked Encoding**: File content is sent using chunked transfer encoding

### 3.2 Chunked Transfer Encoding

MaiAgent correctly implements HTTP chunked transfer encoding:

```
Chunked transfer format:
<data size (hexadecimal)>\r\n
<data content>\r\n
...
0\r\n\r\n  (terminator)
```

**Why is correct chunked encoding important?**

* The ICAP protocol requires HTTP chunked transfer encoding for sending request and response bodies
* Incorrect encoding formats will cause the ICAP server to fail to parse file content correctly
* MaiAgent has fixed the hexadecimal format representation of chunk sizes to ensure compatibility with the ICAP standard

### 3.3 ICAP Response Handling

The ICAP server may return the following responses:

| Status Code            | Meaning                                        | MaiAgent Handling                                  |
| ---------------------- | ---------------------------------------------- | -------------------------------------------------- |
| **200 OK**             | Content has been inspected or modified         | Accept modified content (if any), store file       |
| **204 No Content**     | No content modification needed                 | Store the original file directly                   |
| **403 Forbidden**      | Content is prohibited (e.g., contains a virus) | Reject storage, send ICAP\_BLOCKED event to client |
| **500 Internal Error** | ICAP server error                              | Log error, send ICAP\_ERROR event to client        |

***

## 4. WebChat Integration

When ICAP scanning is complete, the system notifies the user of the scan results in real time:

* **File Blocked**: Displays a user-friendly error message explaining why the file was rejected (e.g., virus detected)
* **System Error**: Prompts the user to try again later and logs the error

### User Experience Optimization

* **Instant Feedback**: Displays a "Scanning" status immediately after file upload
* **Progress Indicator**: Shows a progress bar during large file uploads
* **Friendly Error Messages**: Uses clear language to explain why a file was rejected
* **Retry Mechanism**: Provides a re-upload option for temporary errors

***

## 5. Scan Record Management

### 5.1 Record Retention

MaiAgent stores detailed records of all ICAP scans:

Each scan record contains the following information:

| Field             | Description                                        |
| ----------------- | -------------------------------------------------- |
| File ID           | Unique identifier of the scanned file              |
| Filename          | Original filename                                  |
| File Hash         | SHA-256 hash for verifying file integrity          |
| Scan Result       | PASS, BLOCKED, or ERROR                            |
| Scan Reason       | Specific reason when a file is blocked             |
| ICAP Server       | Address of the ICAP server that performed the scan |
| Scan Time         | Timestamp of when the scan was executed            |
| User/Organization | User and organization that initiated the scan      |

### 5.2 Expired Record Cleanup

MaiAgent implements an automatic cleanup mechanism that periodically deletes expired scan records:

The system uses background scheduled tasks to automatically clean up scan records that exceed the retention period.

**Cleanup strategy**:

* **Retention Period**: Scan records are retained for 90 days by default
* **Execution Frequency**: Cleanup task runs once per week
* **Audit Retention**: Critical security events (such as virus detections) can be configured with longer retention periods

***

## 6. Configuration and Deployment

### 6.1 ICAP Server Configuration

Administrators need to configure ICAP server information in the MaiAgent admin panel:

The following ICAP parameters can be configured in the admin panel:

| Setting                  | Description                                             |
| ------------------------ | ------------------------------------------------------- |
| ICAP Server Address      | Connection URL for the ICAP service                     |
| Timeout                  | Scan timeout in seconds                                 |
| Retry Count and Interval | Retry strategy for failed scans                         |
| Scan Scope               | Choose to scan file uploads and/or conversation content |
| File Size Limit          | Files exceeding this size will not be scanned           |
| File Extension Whitelist | Specify file types that do not require scanning         |

### 6.2 Supported ICAP Services

MaiAgent is compatible with the following mainstream ICAP services:

* **Symantec Protection Engine**: Enterprise-grade antivirus and content filtering
* **McAfee Web Gateway**: Gateway-level content scanning
* **Trend Micro IWSVA**: Integrated web security and content scanning
* **Kaspersky Scan Engine**: Kaspersky's ICAP scanning engine
* **ClamAV (via c-icap)**: Open-source antivirus solution

### 6.3 Performance Considerations

To ensure system performance, the following recommendations apply:

* **Asynchronous Scanning**: Use background tasks for scanning large files to avoid blocking upload responses
* **Caching**: Cache scan results for identical files (same hash value)
* **ICAP Connection Pooling**: Maintain persistent connections with the ICAP server to reduce connection overhead
* **Load Balancing**: Use multiple ICAP server instances to distribute scanning load

***

## 7. Technical Advantages of MaiAgent's ICAP Integration

### 7.1 Security Advantages

* **Multi-layered Protection**: Files are scanned before storage to prevent malicious content
* **Real-time Blocking**: Threats are blocked immediately and never stored in the system
* **Professional Engines**: Integrates industry-leading antivirus and DLP engines
* **Complete Records**: All scanning activities are recorded in detail for security auditing

### 7.2 Compliance Advantages

* **Standard Protocol**: Uses the industry-standard ICAP protocol for easy integration with existing enterprise security infrastructure
* **Audit Trail**: Complete scan records meet compliance requirements
* **Flexible Configuration**: Scanning strategies can be adjusted based on different industry compliance needs
* **Data Protection**: Supports DLP engines to prevent sensitive data leakage

### 7.3 Operations Advantages

* **Decoupled Architecture**: ICAP scanning logic is decoupled from the application, making maintenance easier
* **Horizontal Scaling**: Easily add ICAP servers to handle more scan requests
* **Unified Management**: Enterprises can use their existing ICAP infrastructure to centrally manage content scanning across all applications
* **Detailed Logging**: Comprehensive scan logs for troubleshooting and performance optimization

***

## 8. Related Documentation

* [AI Safety Guardrails](https://docs.maiagent.ai/tech/maiagent-tech-en/advanced-genai-tech/guardrails) - Learn about MaiAgent's multi-layered security protection strategies
* [File Upload and Knowledge Base Management](https://docs.maiagent.ai/km/km-basic-settings)

### Reference Links

* [ICAP RFC 3507](https://datatracker.ietf.org/doc/html/rfc3507)
* [Chunked Transfer Encoding (RFC 7230)](https://datatracker.ietf.org/doc/html/rfc7230#section-4.1)
* [Open ICAP Forum](https://www.icap-forum.org/)
