Overview
In addition to providing SaaS services, the MaiAgent platform also offers self-hosted solutions (private cloud, on-premises). The MaiAgent platform itself only requires general computing resources without GPU services. The model services used by MaiAgent (LLM, Embedding Model, Reranker Model) require computing power, which can utilize either cloud API inference services or on-premises GPU infrastructure.
A hybrid cloud architecture is also possible, with MaiAgent on-premises (private cloud, on-premises) and model services (LLM, Embedding Model, Reranker Model) in the cloud. MaiAgent supports LLM, Embedding model, and Reranker model provided by all cloud service providers (CSPs). If data security concerns arise or on-premises costs decrease in the future, switching to on-premises computing power can be done immediately.
MaiAgent Platform Overview
MaiAgent is a comprehensive generative AI platform providing end-to-end services from system backend to user frontend. The platform adopts a scalable microservices architecture and supports mainstream cloud environments (AWS, GCP, Azure, Oracle) as well as on-premises environments (Docker, K8s), allowing flexible deployment based on enterprise needs.
The platform core is built on Docker, combining diverse service modules covering API, task scheduling, data storage, cache management, and frontend/backend applications. Its overall design ensures high availability, scalability, and cross-cloud integration capabilities while maintaining security and operability.
MaiAgent Server
MaiAgent core, API, system management backend
EKS(Fargate)
GKE
AKS
Django on Docker
MaiAgent Worker Server
MaiAgent queue processing, asynchronous service worker, especially for message stream output
EKS(Fargate)
GKE
AKS
Django on Docker
MaiAgent Worker Server (Low-Priority)
MaiAgent queue processing, asynchronous service worker, especially for document vectorization
EKS(Fargate)
GKE
AKS
Django on Docker
MaiAgent Admin Frontend
MaiAgent management platform
S3 + CloudFront
GCS+Google Cloud CDN
Azure Blob Storage + Azure CDN
Nginx + Static Files on Docker
MaiAgent Web Chat Frontend
MaiAgent web chat frontend
S3 + CloudFront
GCS+Google Cloud CDN
Azure Blob Storage + Azure CDN
Nginx + Static Files on Docker
Relational Database (RDB) - PostgreSQL
Store MaiAgent data
RDS
Cloud SQL
Azure Database
PostgreSQL on Docker
Vector Database (Vector DB) - Elasticsearch
Store vectors needed for RAG and memory functions
Elasticsearch
Elasticsearch
Elasticsearch
Elasticsearch on Docker
Static Storage
Store static files and web pages
S3
GCS
Azure Blob Storage
MinIO on Docker
Memory Cache - Redis
API cache, queue scheduling service queue
ElastiCache
Memorystore
Azure Cache
Redis on Docker
Model Services
The MaiAgent platform is designed to be compatible with both cloud API inference services and self-hosted GPU environments.
With cloud API inference services, MaiAgent can directly integrate with various LLM, Embedding, and Reranker APIs, enabling rapid scaling and support for dynamic traffic demands, facilitating experimentation and quick deployment.
In self-hosted GPU mode, MaiAgent can integrate with models deployed locally or in private data centers, fully utilizing GPU resources and optimizing inference while ensuring data privacy and compliance.
Users can freely switch between flexibility and cost based on their needs, or even use a hybrid approach, making MaiAgent a unified inference and service management layer.
MaiAgent Platform Compatibility
AWS Bedrock Google Vertex AI Azure AI Oracle OCI
HPE Advantech Cisco Dell
Model Capabilities
Closed-source models: High Open-source models: Same as self-hosted GPU
Depends on open-source model releases
Speed
Faster Claude 4 Sonnet: 80 token/s Gemini 2.5 Pro: 156 token/s
Medium Llama3.3 70B: 25.01 token/s (H100 example)
Investment Cost
Token API fees (pay-as-you-go)
Hardware costs Data center costs Hardware and model maintenance personnel costs Hardware depreciation
Concurrent Users
Based on cloud service provider support
Based on GPU quantity (Currently about 25 users per H100 GPU)
Data Security
Using cloud service providers (AWS, GCP, Azure, Oracle) that promise not to use data for training
Highest confidentiality, most secure
Personal Data Issues
Use DLP Server or services to remove personal data
None
Platform Deployment Environments
To meet different customer needs during development, testing, and deployment processes, our software platform provides multiple layered environments. These environment setups ensure proper validation and control of every process from development to production.
Environment Architecture
Our platform setup follows industry best practices and flexibly provides the following environments based on customer needs:
PROD
Internal
Production Environment
Official external service, uses real data, requires high stability and security
UAT
Internal
User Acceptance Testing
For customer and business unit acceptance, verifying system functionality meets requirements, environment similar to production
SIT
Internal, may need to connect to internal systems
System Integration Testing
Validates integration and compatibility between different modules and services, uses near-real test data
DEV
External development for added features to speed up
Development Testing Environment
For developer program development and unit testing, uses simulated data, frequent updates, tolerates errors
Environment Combinations
Different customers can choose suitable environment combinations based on project requirements, such as:
PROD Only: Suitable for small projects or simple deployment needs, directly deployed to production environment.
PROD + UAT: Suitable for projects requiring acceptance testing, ensuring functionality meets requirements before going live.
Complete Environment Combination (DEV + SIT + UAT + PROD): Suitable for large or complex projects requiring complete development, integration testing, and acceptance processes.
We flexibly configure based on customer needs, ensuring optimal balance between cost and quality.
Last updated
Was this helpful?