Deployment Architecture
MaiAgent is a scalable generative AI platform that supports diverse application scenarios. To accommodate different business requirements and resource allocation methods, the platform offers two main deployment modes: Monolithic Deployment and Distributed Deployment.
This section will explain the differences between these two architectures, their applicable scenarios, their respective advantages and disadvantages, and provide practical deployment references.
Monolithic Deployment
Architecture Overview
In the monolithic deployment mode, all core components of MaiAgent (such as main service, task scheduling service, data storage, database, frontend service, etc.) are installed and run on the same server. It features centralized management, simple deployment, and is suitable for rapid launch and testing environments.
The MaiAgent platform can operate without GPU, allowing smooth deployment and execution in standard CPU environments. However, when deployed on machines with GPU resources, the platform can also be deployed alongside models in the same environment to fully utilize hardware acceleration capabilities. Below are two common architecture diagrams for reference.
Architecture Diagrams
Deployment on GPU-enabled server:
MaiAgent platform and model services are installed on the same machine, with the platform handling request coordination and traffic control through internal APIs, while the model utilizes GPU for efficient inference capabilities. When the platform and model services are placed together, there's no need to purchase additional servers solely for running the platform, reducing overall hardware costs.

Deployment on non-GPU server:
When MaiAgent is deployed on a server without GPU, since model services are still required, it needs to integrate with GPU servers or cloud API inference services through API connections. When the platform and model services are separated, they can be scaled independently, allowing flexible increase or decrease of computing power based on needs, making the architecture more flexible and maintainable.

Distributed Deployment
Architecture Overview
In the distributed deployment mode, MaiAgent's core modules are split into independent services and distributed across multiple servers. Different modules can be horizontally scaled according to needs, achieving high availability and large-scale processing capabilities.
Cloud Platform (Cloud PaaS) Environment In public or private cloud environments, you can directly utilize Platform as a Service (PaaS) capabilities, such as Kubernetes, AWS ECS/EKS, GCP Cloud Run, Azure App Service, etc. These services provide container orchestration, load balancing, auto-scaling, and monitoring mechanisms, enabling quick deployment and dynamic resource adjustment of distributed modules while reducing infrastructure maintenance burden.
On-Premise VM Environment Even in on-premise VM scenarios, you can set up container platforms or application service frameworks through virtual machines or bare metal servers to achieve distributed management and scalability similar to cloud environments. Although cluster resources, monitoring, and redundancy mechanisms need to be planned independently, high availability and elastic scaling can still be achieved.
Architecture Diagram

Deployment Mode Comparison Table
Architecture Design
All components centralized on a single server/container
Components split into independent services, distributed across multiple nodes
Infrastructure Cost
Low, single server sufficient
High, requires multiple servers or cloud resources
Deployment Cost
Low
High, complex deployment, requires DevOps team
Maintenance Cost
Low, centralized management
High, requires cross-server and cross-service maintenance and monitoring
Scalability
None, limited by single machine resources
Yes, can independently scale bottleneck modules
High Availability
None, single point of failure leads to system-wide outage
Yes, single service failure doesn't affect overall system
Suitable Scenarios
PoC, development testing, small-scale applications
Production deployment, large-scale, multi-department
Last updated
Was this helpful?
