# Overview

In addition to providing SaaS services, the MaiAgent platform also offers self-hosted solutions (private cloud, on-premises). The MaiAgent platform itself only requires general computing resources without GPU services. The model services used by MaiAgent (LLM, Embedding Model, Reranker Model) require computing power, which can utilize either cloud API inference services or on-premises GPU infrastructure.

A hybrid cloud architecture is also possible, with MaiAgent on-premises (private cloud, on-premises) and model services (LLM, Embedding Model, Reranker Model) in the cloud. MaiAgent supports LLM, Embedding model, and Reranker model provided by all cloud service providers (CSPs). If data security concerns arise or on-premises costs decrease in the future, switching to on-premises computing power can be done immediately.

## MaiAgent Platform Overview

**MaiAgent** is a comprehensive generative AI platform providing end-to-end services from system backend to user frontend. The platform adopts a scalable microservices architecture and supports mainstream cloud environments (AWS, GCP, Azure, Oracle) as well as on-premises environments (Docker, K8s), allowing flexible deployment based on enterprise needs.

The platform core is built on **Docker**, combining diverse service modules covering API, task scheduling, data storage, cache management, and frontend/backend applications. Its overall design ensures **high availability, scalability, and cross-cloud integration capabilities** while maintaining security and operability.

| Service Item                                | Purpose                                                                                       | AWS             | GCP                  | Azure                          | VMs                            |
| ------------------------------------------- | --------------------------------------------------------------------------------------------- | --------------- | -------------------- | ------------------------------ | ------------------------------ |
| MaiAgent Server                             | MaiAgent core, API, system management backend                                                 | EKS(Fargate)    | GKE                  | AKS                            | Django on Docker               |
| MaiAgent Worker Server                      | MaiAgent queue processing, asynchronous service worker, especially for message stream output  | EKS(Fargate)    | GKE                  | AKS                            | Django on Docker               |
| MaiAgent Worker Server (Low-Priority)       | MaiAgent queue processing, asynchronous service worker, especially for document vectorization | EKS(Fargate)    | GKE                  | AKS                            | Django on Docker               |
| MaiAgent Admin Frontend                     | MaiAgent management platform                                                                  | S3 + CloudFront | GCS+Google Cloud CDN | Azure Blob Storage + Azure CDN | Nginx + Static Files on Docker |
| MaiAgent Web Chat Frontend                  | MaiAgent web chat frontend                                                                    | S3 + CloudFront | GCS+Google Cloud CDN | Azure Blob Storage + Azure CDN | Nginx + Static Files on Docker |
| Relational Database (RDB) - PostgreSQL      | Store MaiAgent data                                                                           | RDS             | Cloud SQL            | Azure Database                 | PostgreSQL on Docker           |
| Vector Database (Vector DB) - Elasticsearch | Store vectors needed for RAG and memory functions                                             | Elasticsearch   | Elasticsearch        | Elasticsearch                  | Elasticsearch on Docker        |
| Static Storage                              | Store static files and web pages                                                              | S3              | GCS                  | Azure Blob Storage             | MinIO on Docker                |
| Memory Cache - Redis                        | API cache, queue scheduling service queue                                                     | ElastiCache     | Memorystore          | Azure Cache                    | Redis on Docker                |

## Model Services

The **MaiAgent** platform is designed to be compatible with both **cloud API inference services** and **self-hosted GPU environments**.

* With **cloud API inference services**, MaiAgent can directly integrate with various LLM, Embedding, and Reranker APIs, enabling rapid scaling and support for dynamic traffic demands, facilitating experimentation and quick deployment.
* In **self-hosted GPU mode**, MaiAgent can integrate with models deployed locally or in private data centers, fully utilizing GPU resources and optimizing inference while ensuring data privacy and compliance.

Users can freely switch between flexibility and cost based on their needs, or even use a hybrid approach, making **MaiAgent** a unified inference and service management layer.

|                                 | Cloud API Inference Services                                                                      | Self-hosted GPU                                                                                                       |
| ------------------------------- | ------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------- |
| MaiAgent Platform Compatibility | <p>AWS Bedrock<br>Google Vertex AI<br>Azure AI<br>Oracle OCI</p>                                  | <p>HPE<br>Advantech<br>Cisco<br>Dell</p>                                                                              |
| Model Capabilities              | <p>Closed-source models: High<br>Open-source models: Same as self-hosted GPU</p>                  | Depends on open-source model releases                                                                                 |
| Speed                           | <p>Faster<br>Claude 4 Sonnet: 80 token/s<br>Gemini 2.5 Pro: 156 token/s</p>                       | <p>Medium<br>Llama3.3 70B: 25.01 token/s (H100 example)</p>                                                           |
| Investment Cost                 | <p>Token API fees<br>(pay-as-you-go)</p>                                                          | <p>Hardware costs<br>Data center costs<br>Hardware and model maintenance personnel costs<br>Hardware depreciation</p> |
| Concurrent Users                | Based on cloud service provider support                                                           | <p>Based on GPU quantity<br>(Currently about 25 users per H100 GPU)</p>                                               |
| Data Security                   | Using cloud service providers (AWS, GCP, Azure, Oracle) that promise not to use data for training | Highest confidentiality, most secure                                                                                  |
| Personal Data Issues            | Use DLP Server or services to remove personal data                                                | None                                                                                                                  |

## Platform Deployment Environments

To meet different customer needs during development, testing, and deployment processes, our software platform provides multiple layered environments. These environment setups ensure proper validation and control of every process from development to production.

### Environment Architecture

Our platform setup follows industry best practices and flexibly provides the following environments based on customer needs:

| Environment Name | Notes                                               | Main Purpose                    | Characteristics                                                                                                                 |
| ---------------- | --------------------------------------------------- | ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------- |
| PROD             | Internal                                            | Production Environment          | Official external service, uses real data, requires high stability and security                                                 |
| UAT              | Internal                                            | User Acceptance Testing         | For customer and business unit acceptance, verifying system functionality meets requirements, environment similar to production |
| SIT              | Internal, may need to connect to internal systems   | System Integration Testing      | Validates integration and compatibility between different modules and services, uses near-real test data                        |
| DEV              | External development for added features to speed up | Development Testing Environment | For developer program development and unit testing, uses simulated data, frequent updates, tolerates errors                     |

### Environment Combinations

Different customers can choose suitable environment combinations based on project requirements, such as:

* **PROD Only**: Suitable for small projects or simple deployment needs, directly deployed to production environment.
* **PROD + UAT**: Suitable for projects requiring acceptance testing, ensuring functionality meets requirements before going live.
* **Complete Environment Combination (DEV + SIT + UAT + PROD)**: Suitable for large or complex projects requiring complete development, integration testing, and acceptance processes.

We flexibly configure based on customer needs, ensuring optimal balance between cost and quality.
