Overview
In addition to providing SaaS services, the MaiAgent platform also offers self-hosted deployment options (private cloud, on-premises). The MaiAgent platform itself only requires general computing resources and does not need GPU services. The model services used by MaiAgent (LLM, Embedding Model, Reranker Model) require compute power and can use cloud API inference services or on-premises GPU deployments.
A hybrid cloud architecture can also be adopted: MaiAgent on-premises (private cloud, on-premises) and the model services (LLM, Embedding Model, Reranker Model) in the cloud. MaiAgent supports LLM, Embedding model, and Reranker model offerings from all cloud service providers (CSPs). In the future, if data security requirements or lower on-premises costs motivate a move to local compute, you can switch immediately.
MaiAgent platform overview
MaiAgent is a comprehensive generative AI platform that provides end-to-end services from backend systems to user-facing frontends. The platform adopts a scalable microservices architecture and supports mainstream cloud environments (AWS, GCP, Azure, Oracle) as well as on-premises deployments (Docker, K8s), allowing flexible deployment according to enterprise needs.
The platform core is based on Docker combined with a variety of service modules covering API, task scheduling, data storage, cache management, and frontend/backoffice applications. Its overall design ensures high availability, flexible scalability, and cross-cloud integration capabilities, while also taking security and maintainability into account.
Services
Purpose
AWS
GCP
Azure
VMs
MaiAgent Server
MaiAgent core, API, system administration backend
EKS (Fargate)
GKE
AKS
Django on Docker
MaiAgent Worker Server
MaiAgent workers that handle queuing and asynchronous services
EKS (Fargate)
GKE
AKS
Django on Docker
MaiAgent Admin frontend
MaiAgent management platform
S3 + CloudFront
GCS + Google Cloud CDN
Azure Blob Storage + Azure CDN
Nginx + Static Files on Docker
MaiAgent Web Chat frontend
MaiAgent web chat frontend
S3 + CloudFront
GCS + Google Cloud CDN
Azure Blob Storage + Azure CDN
Nginx + Static Files on Docker
Relational Database (RDB) - PostgreSQL
Stores various MaiAgent data
RDS
Cloud SQL
Azure Database
PostgreSQL on Docker
Vector Database (Vector DB) - Elasticsearch
Stores vectors required for RAG and memory features
Elasticsearch
Elasticsearch
Elasticsearch
Elasticsearch on Docker
Static Storage
Stores static documents and static web pages
S3
GCS
Azure Blob Storage
MinIO on Docker
Memory Cache - Redis
API cache and queues for scheduling services
ElastiCache
Memorystore
Azure Cache
Redis on Docker
Model Services
MaiAgent The platform is designed to be compatible with both "cloud API inference services"and "self-hosted GPU environments".
Under cloud API inference services MaiAgent can directly connect to various LLM, Embedding, and Reranker APIs, quickly scale and support dynamic traffic demands, facilitating experimentation and rapid deployment.
Under Self-hosted GPU mode Under self-hosted GPU mode, MaiAgent can connect to model services deployed locally or in private data centers, fully utilize GPU resources and perform optimized inference while ensuring data privacy and compliance.
Users can switch freely between flexibility and cost according to their needs, or even use a hybrid of the two approaches, making MaiAgent the unified inference and service management layer.
cloud API inference services
Self-hosted GPU
MaiAgent platform is compatible with
AWS Bedrock Google Vertex AI Azure AI Oracle OCI
HPE Advantech Cisco Dell
Model capabilities
Closed-source models: high Open-source models: same as self-hosted GPU
Depends on the release of open-source models
Speed
Faster
Claude 4 Sonnet: 80 token/s
Gemini 2.5 Pro: 156 token/s
Medium
Llama3.3 70B: 25.01 token/s (example H100)
Upfront costs
Token API fees (pay-as-you-go)
Machine costs
Data center costs
Machine and model maintenance personnel costs
Machine depreciation
Concurrent users
Depends on cloud service provider support
Depends on number of GPUs (currently one H100 GPU supports about 25 people)
Data security
Cloud service providers that commit not to use data for training (AWS, GCP, Azure, Oracle)
Highly confidential, most secure
Personal data issues
Use a DLP server or service to remove personal data
None
Platform deployment environments
To meet different customers' needs during development, testing, and production rollout, our software platform provides multiple layered environments. The setup of these environments ensures that every process from development to production deployment can be properly verified and controlled.
Environment architecture
Our platform is built following industry best practices and flexibly offers the following environments according to customer needs:
Environment name
Notes
Primary purpose
Characteristics
PROD
Internal
Production environment
Officially serves external users, uses real data, and requires high stability and security.
UAT
Internal
User acceptance testing
Provided for clients and business units to perform acceptance testing to confirm system functionality meets requirements; environment is similar to production.
SIT
Internal, because it may need to integrate with internal systems
System integration testing
Verifies integration and compatibility between different modules and services, using test data close to real-world scenarios.
DEV
Accelerate development of value-added features externally
Development and testing environment
Provided for developers to perform coding and unit testing, uses simulated data, updates frequently, and tolerates errors.
Environment combinations
Different customers can choose suitable environment combinations according to project needs, for example:
PROD only: Suitable for small projects or simple production-only requirements, directly deployed in the production environment.
PROD + UAT: Suitable for projects that require acceptance testing to ensure functionality meets requirements before going live.
Full environment combination (DEV + SIT + UAT + PROD): Suitable for large or complex projects that require complete development, integration testing, and acceptance workflows.
We will flexibly configure according to customer needs to ensure the best balance between cost and quality.
Last updated
Was this helpful?