Overview

In addition to providing SaaS services, the MaiAgent platform also offers self-hosted solutions (private cloud, on-premises). The MaiAgent platform itself only requires general computing resources without GPU services. The model services used by MaiAgent (LLM, Embedding Model, Reranker Model) require computing power, which can utilize either cloud API inference services or on-premises GPU infrastructure.

A hybrid cloud architecture is also possible, with MaiAgent on-premises (private cloud, on-premises) and model services (LLM, Embedding Model, Reranker Model) in the cloud. MaiAgent supports LLM, Embedding model, and Reranker model provided by all cloud service providers (CSPs). If data security concerns arise or on-premises costs decrease in the future, switching to on-premises computing power can be done immediately.

MaiAgent Platform Overview

MaiAgent is a comprehensive generative AI platform providing end-to-end services from system backend to user frontend. The platform adopts a scalable microservices architecture and supports mainstream cloud environments (AWS, GCP, Azure, Oracle) as well as on-premises environments (Docker, K8s), allowing flexible deployment based on enterprise needs.

The platform core is built on Docker, combining diverse service modules covering API, task scheduling, data storage, cache management, and frontend/backend applications. Its overall design ensures high availability, scalability, and cross-cloud integration capabilities while maintaining security and operability.

Service Item
Purpose
AWS
GCP
Azure
VMs

MaiAgent Server

MaiAgent core, API, system management backend

EKS(Fargate)

GKE

AKS

Django on Docker

MaiAgent Worker Server

MaiAgent queue processing, asynchronous service worker, especially for message stream output

EKS(Fargate)

GKE

AKS

Django on Docker

MaiAgent Worker Server (Low-Priority)

MaiAgent queue processing, asynchronous service worker, especially for document vectorization

EKS(Fargate)

GKE

AKS

Django on Docker

MaiAgent Admin Frontend

MaiAgent management platform

S3 + CloudFront

GCS+Google Cloud CDN

Azure Blob Storage + Azure CDN

Nginx + Static Files on Docker

MaiAgent Web Chat Frontend

MaiAgent web chat frontend

S3 + CloudFront

GCS+Google Cloud CDN

Azure Blob Storage + Azure CDN

Nginx + Static Files on Docker

Relational Database (RDB) - PostgreSQL

Store MaiAgent data

RDS

Cloud SQL

Azure Database

PostgreSQL on Docker

Vector Database (Vector DB) - Elasticsearch

Store vectors needed for RAG and memory functions

Elasticsearch

Elasticsearch

Elasticsearch

Elasticsearch on Docker

Static Storage

Store static files and web pages

S3

GCS

Azure Blob Storage

MinIO on Docker

Memory Cache - Redis

API cache, queue scheduling service queue

ElastiCache

Memorystore

Azure Cache

Redis on Docker

Model Services

The MaiAgent platform is designed to be compatible with both cloud API inference services and self-hosted GPU environments.

  • With cloud API inference services, MaiAgent can directly integrate with various LLM, Embedding, and Reranker APIs, enabling rapid scaling and support for dynamic traffic demands, facilitating experimentation and quick deployment.

  • In self-hosted GPU mode, MaiAgent can integrate with models deployed locally or in private data centers, fully utilizing GPU resources and optimizing inference while ensuring data privacy and compliance.

Users can freely switch between flexibility and cost based on their needs, or even use a hybrid approach, making MaiAgent a unified inference and service management layer.

Cloud API Inference Services
Self-hosted GPU

MaiAgent Platform Compatibility

AWS Bedrock Google Vertex AI Azure AI Oracle OCI

HPE Advantech Cisco Dell

Model Capabilities

Closed-source models: High Open-source models: Same as self-hosted GPU

Depends on open-source model releases

Speed

Faster Claude 4 Sonnet: 80 token/s Gemini 2.5 Pro: 156 token/s

Medium Llama3.3 70B: 25.01 token/s (H100 example)

Investment Cost

Token API fees (pay-as-you-go)

Hardware costs Data center costs Hardware and model maintenance personnel costs Hardware depreciation

Concurrent Users

Based on cloud service provider support

Based on GPU quantity (Currently about 25 users per H100 GPU)

Data Security

Using cloud service providers (AWS, GCP, Azure, Oracle) that promise not to use data for training

Highest confidentiality, most secure

Personal Data Issues

Use DLP Server or services to remove personal data

None

Platform Deployment Environments

To meet different customer needs during development, testing, and deployment processes, our software platform provides multiple layered environments. These environment setups ensure proper validation and control of every process from development to production.

Environment Architecture

Our platform setup follows industry best practices and flexibly provides the following environments based on customer needs:

Environment Name
Notes
Main Purpose
Characteristics

PROD

Internal

Production Environment

Official external service, uses real data, requires high stability and security

UAT

Internal

User Acceptance Testing

For customer and business unit acceptance, verifying system functionality meets requirements, environment similar to production

SIT

Internal, may need to connect to internal systems

System Integration Testing

Validates integration and compatibility between different modules and services, uses near-real test data

DEV

External development for added features to speed up

Development Testing Environment

For developer program development and unit testing, uses simulated data, frequent updates, tolerates errors

Environment Combinations

Different customers can choose suitable environment combinations based on project requirements, such as:

  • PROD Only: Suitable for small projects or simple deployment needs, directly deployed to production environment.

  • PROD + UAT: Suitable for projects requiring acceptance testing, ensuring functionality meets requirements before going live.

  • Complete Environment Combination (DEV + SIT + UAT + PROD): Suitable for large or complex projects requiring complete development, integration testing, and acceptance processes.

We flexibly configure based on customer needs, ensuring optimal balance between cost and quality.

Last updated

Was this helpful?