# Deployment Architecture

MaiAgent is a scalable generative AI platform that supports diverse application scenarios. To accommodate different business requirements and resource allocation methods, the platform offers two main deployment modes: **Monolithic Deployment** and **Distributed Deployment**.

This section will explain the differences between these two architectures, their applicable scenarios, their respective advantages and disadvantages, and provide practical deployment references.

## Monolithic Deployment

### Architecture Overview

In the monolithic deployment mode, all core components of MaiAgent (such as main service, task scheduling service, data storage, database, frontend service, etc.) are installed and run on the same server. It features **centralized management**, simple deployment, and is suitable for rapid launch and testing environments.

The MaiAgent platform can operate without GPU, allowing smooth deployment and execution in standard CPU environments. However, when deployed on machines with GPU resources, the platform can also be deployed alongside models in the same environment to fully utilize hardware acceleration capabilities. Below are two common architecture diagrams for reference.

### Architecture Diagrams

1. Deployment on GPU-enabled server:

MaiAgent platform and model services are installed on the same machine, with the platform handling request coordination and traffic control through internal APIs, while the model utilizes GPU for efficient inference capabilities. When the platform and model services are placed together, there's no need to purchase additional servers solely for running the platform, reducing overall hardware costs.

<figure><img src="https://3415477754-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FNBTi475lqozGpB7xObpE%2Fuploads%2Fgit-blob-c44f0012dfa26a57f94b3502599ef95e0449b82b%2F%E6%88%AA%E5%9C%96%202025-09-07%20%E4%B8%8A%E5%8D%8810.45.02.png?alt=media" alt=""><figcaption></figcaption></figure>

2. Deployment on non-GPU server:

When MaiAgent is deployed on a server without GPU, since model services are still required, it needs to integrate with GPU servers or cloud API inference services through API connections. When the platform and model services are separated, they can be scaled independently, allowing flexible increase or decrease of computing power based on needs, making the architecture more flexible and maintainable.

<figure><img src="https://3415477754-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FNBTi475lqozGpB7xObpE%2Fuploads%2Fgit-blob-ae088ec75c4ed0ad2b74d027af6f5afdae86dff2%2F%E6%88%AA%E5%9C%96%202025-09-07%20%E4%B8%8A%E5%8D%8811.35.32.png?alt=media" alt=""><figcaption></figcaption></figure>

## Distributed Deployment

### Architecture Overview

In the distributed deployment mode, MaiAgent's core modules are split into independent services and distributed across multiple servers. Different modules can be horizontally scaled according to needs, achieving high availability and large-scale processing capabilities.

* **Cloud Platform (Cloud PaaS) Environment** In public or private cloud environments, you can directly utilize Platform as a Service (PaaS) capabilities, such as Kubernetes, AWS ECS/EKS, GCP Cloud Run, Azure App Service, etc. These services provide container orchestration, load balancing, auto-scaling, and monitoring mechanisms, enabling quick deployment and dynamic resource adjustment of distributed modules while reducing infrastructure maintenance burden.
* **On-Premise VM Environment** Even in on-premise VM scenarios, you can set up container platforms or application service frameworks through virtual machines or bare metal servers to achieve distributed management and scalability similar to cloud environments. Although cluster resources, monitoring, and redundancy mechanisms need to be planned independently, high availability and elastic scaling can still be achieved.

### Architecture Diagram

<figure><img src="https://3415477754-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FNBTi475lqozGpB7xObpE%2Fuploads%2Fgit-blob-b75d99c8b7d76847ec9ff2bd2c967640c156f6b3%2F%E6%88%AA%E5%9C%96%202025-09-07%20%E4%B8%8A%E5%8D%8811.28.15.png?alt=media" alt=""><figcaption></figcaption></figure>

## Deployment Mode Comparison Table

<table><thead><tr><th width="133.58203125">Characteristics</th><th>Monolithic Deployment</th><th>Distributed Deployment</th></tr></thead><tbody><tr><td><strong>Architecture Design</strong></td><td>All components centralized on a single server/container</td><td>Components split into independent services, distributed across multiple nodes</td></tr><tr><td><strong>Infrastructure Cost</strong></td><td>Low, single server sufficient</td><td>High, requires multiple servers or cloud resources</td></tr><tr><td><strong>Deployment Cost</strong></td><td>Low</td><td>High, complex deployment, requires DevOps team</td></tr><tr><td><strong>Maintenance Cost</strong></td><td>Low, centralized management</td><td>High, requires cross-server and cross-service maintenance and monitoring</td></tr><tr><td><strong>Scalability</strong></td><td>None, limited by single machine resources</td><td>Yes, can independently scale bottleneck modules</td></tr><tr><td><strong>High Availability</strong></td><td>None, single point of failure leads to system-wide outage</td><td>Yes, single service failure doesn't affect overall system</td></tr><tr><td><strong>Suitable Scenarios</strong></td><td>PoC, development testing, small-scale applications</td><td>Production deployment, large-scale, multi-department</td></tr></tbody></table>
