Deployment Architecture

MaiAgent is a scalable generative AI platform that supports a variety of application scenarios. To accommodate different business needs and resource allocation methods, the platform offers two main deployment modes:Monolithic deployment and Distributed deployment.

This chapter will explain the differences between these two architectures, the applicable scenarios, and their respective advantages and disadvantages, and will provide practical deployment references.

Monolithic deployment

Architecture description

In the monolithic deployment mode, all core components of MaiAgent (such as the main service, task scheduling service, data storage, database, frontend service, etc.) are installed and run on the same server. Its characteristic iscentralized management, simple deployment, suitable for quick launches and testing environments.

The MaiAgent platform can run without a GPU and can be smoothly deployed and executed in a general CPU environment. However, if running on a machine with GPU resources, the platform can also be colocated with the model to fully utilize hardware acceleration. Below are two common architecture diagrams for reference.

Architecture diagrams

  1. Deployed on a GPU-equipped server:

The MaiAgent platform and the model service are installed on the same machine. The platform is responsible for request coordination and traffic control via internal APIs, while the model uses the GPU to provide efficient inference capabilities. When the platform and model service are colocated, there is no need to additionally purchase a general server that only runs the platform, reducing overall hardware expenditure.

  1. Deployed on a non-GPU server:

When MaiAgent is deployed on a non-GPU server, model services are still required, so it must connect via APIs to a GPU server or cloud API inference service. When the platform and model service are separated, they can be scaled independently, flexibly increasing or decreasing compute power as needed, making the architecture more flexible and maintainable.

Distributed deployment

Architecture description

In the distributed deployment mode, MaiAgent's core modules are split into independent services and distributed across multiple servers. Different modules can be horizontally scaled as needed to achieve high availability and large-scale processing capability.

  • Cloud PaaS environment In public or private cloud environments, you can directly utilize Platform-as-a-Service (PaaS) capabilities such as Kubernetes, AWS ECS/EKS, GCP Cloud Run, Azure App Service, etc. These services provide container orchestration, load balancing, auto-scaling, and monitoring mechanisms, allowing distributed modules to be deployed quickly and resources to be dynamically adjusted, reducing infrastructure operation and maintenance burden.

  • On-Premise VM environment Even in on-premise VM scenarios, you can set up container platforms or application service frameworks on virtual machines or bare-metal servers to achieve cloud-like distributed management and scaling capabilities. Although you need to plan cluster resources, monitoring, and failover mechanisms yourself, you can still achieve high availability and elastic scaling.

Architecture diagrams

Deployment mode comparison table

Characteristics
Monolithic deployment
Distributed deployment

Architecture design

All components concentrated on a single server/container

Components split into independent services distributed across multiple nodes

Infrastructure cost

Low, a single server is sufficient

High, requires multiple servers or cloud resources

Deployment cost

Low

High, difficult deployment, requires a DevOps team

Maintenance cost

Low, centralized management

High, requires cross-server and cross-service maintenance and monitoring

Scalability

None, limited by single machine resources

Yes, can independently scale bottleneck modules

High availability

No, single point of failure causes entire system outage

Yes, failure of a single service does not affect the overall system

Applicable scenarios

PoC, development testing, small-scale applications

Production launch, large-scale, multi-department

Last updated

Was this helpful?