Deployment Architecture
MaiAgent is a scalable generative AI platform that supports a variety of application scenarios. To accommodate different business needs and resource allocation methods, the platform offers two main deployment modes:Monolithic deployment and Distributed deployment.
This chapter will explain the differences between these two architectures, the applicable scenarios, and their respective advantages and disadvantages, and will provide practical deployment references.
Monolithic deployment
Architecture description
In the monolithic deployment mode, all core components of MaiAgent (such as the main service, task scheduling service, data storage, database, frontend service, etc.) are installed and run on the same server. Its characteristic iscentralized management, simple deployment, suitable for quick launches and testing environments.
The MaiAgent platform can run without a GPU and can be smoothly deployed and executed in a general CPU environment. However, if running on a machine with GPU resources, the platform can also be colocated with the model to fully utilize hardware acceleration. Below are two common architecture diagrams for reference.
Architecture diagrams
Deployed on a GPU-equipped server:
The MaiAgent platform and the model service are installed on the same machine. The platform is responsible for request coordination and traffic control via internal APIs, while the model uses the GPU to provide efficient inference capabilities. When the platform and model service are colocated, there is no need to additionally purchase a general server that only runs the platform, reducing overall hardware expenditure.

Deployed on a non-GPU server:
When MaiAgent is deployed on a non-GPU server, model services are still required, so it must connect via APIs to a GPU server or cloud API inference service. When the platform and model service are separated, they can be scaled independently, flexibly increasing or decreasing compute power as needed, making the architecture more flexible and maintainable.

Distributed deployment
Architecture description
In the distributed deployment mode, MaiAgent's core modules are split into independent services and distributed across multiple servers. Different modules can be horizontally scaled as needed to achieve high availability and large-scale processing capability.
Cloud PaaS environment In public or private cloud environments, you can directly utilize Platform-as-a-Service (PaaS) capabilities such as Kubernetes, AWS ECS/EKS, GCP Cloud Run, Azure App Service, etc. These services provide container orchestration, load balancing, auto-scaling, and monitoring mechanisms, allowing distributed modules to be deployed quickly and resources to be dynamically adjusted, reducing infrastructure operation and maintenance burden.
On-Premise VM environment Even in on-premise VM scenarios, you can set up container platforms or application service frameworks on virtual machines or bare-metal servers to achieve cloud-like distributed management and scaling capabilities. Although you need to plan cluster resources, monitoring, and failover mechanisms yourself, you can still achieve high availability and elastic scaling.
Architecture diagrams

Deployment mode comparison table
Architecture design
All components concentrated on a single server/container
Components split into independent services distributed across multiple nodes
Infrastructure cost
Low, a single server is sufficient
High, requires multiple servers or cloud resources
Deployment cost
Low
High, difficult deployment, requires a DevOps team
Maintenance cost
Low, centralized management
High, requires cross-server and cross-service maintenance and monitoring
Scalability
None, limited by single machine resources
Yes, can independently scale bottleneck modules
High availability
No, single point of failure causes entire system outage
Yes, failure of a single service does not affect the overall system
Applicable scenarios
PoC, development testing, small-scale applications
Production launch, large-scale, multi-department
Last updated
Was this helpful?