Overview

In addition to providing SaaS services, the MaiAgent platform also offers self-hosted deployment options (private cloud, on-premises). The MaiAgent platform itself only requires general computing resources and does not need GPU services. The model services used by MaiAgent (LLM, Embedding Model, Reranker Model) require compute power and can use cloud API inference services or on-premises GPU deployments.

A hybrid cloud architecture can also be adopted: MaiAgent on-premises (private cloud, on-premises) and the model services (LLM, Embedding Model, Reranker Model) in the cloud. MaiAgent supports LLM, Embedding model, and Reranker model offerings from all cloud service providers (CSPs). In the future, if data security requirements or lower on-premises costs motivate a move to local compute, you can switch immediately.

MaiAgent platform overview

MaiAgent is a comprehensive generative AI platform that provides end-to-end services from backend systems to user-facing frontends. The platform adopts a scalable microservices architecture and supports mainstream cloud environments (AWS, GCP, Azure, Oracle) as well as on-premises deployments (Docker, K8s), allowing flexible deployment according to enterprise needs.

The platform core is based on Docker combined with a variety of service modules covering API, task scheduling, data storage, cache management, and frontend/backoffice applications. Its overall design ensures high availability, flexible scalability, and cross-cloud integration capabilities, while also taking security and maintainability into account.

Services

Purpose

AWS

GCP

Azure

VMs

MaiAgent Server

MaiAgent core, API, system administration backend

EKS (Fargate)

GKE

AKS

Django on Docker

MaiAgent Worker Server

MaiAgent workers that handle queuing and asynchronous services

EKS (Fargate)

GKE

AKS

Django on Docker

MaiAgent Admin frontend

MaiAgent management platform

S3 + CloudFront

GCS + Google Cloud CDN

Azure Blob Storage + Azure CDN

Nginx + Static Files on Docker

MaiAgent Web Chat frontend

MaiAgent web chat frontend

S3 + CloudFront

GCS + Google Cloud CDN

Azure Blob Storage + Azure CDN

Nginx + Static Files on Docker

Relational Database (RDB) - PostgreSQL

Stores various MaiAgent data

RDS

Cloud SQL

Azure Database

PostgreSQL on Docker

Vector Database (Vector DB) - Elasticsearch

Stores vectors required for RAG and memory features

Elasticsearch

Elasticsearch

Elasticsearch

Elasticsearch on Docker

Static Storage

Stores static documents and static web pages

S3

GCS

Azure Blob Storage

MinIO on Docker

Memory Cache - Redis

API cache and queues for scheduling services

ElastiCache

Memorystore

Azure Cache

Redis on Docker

Model Services

MaiAgent The platform is designed to be compatible with both "cloud API inference services"and "self-hosted GPU environments".

  • Under cloud API inference services MaiAgent can directly connect to various LLM, Embedding, and Reranker APIs, quickly scale and support dynamic traffic demands, facilitating experimentation and rapid deployment.

  • Under Self-hosted GPU mode Under self-hosted GPU mode, MaiAgent can connect to model services deployed locally or in private data centers, fully utilize GPU resources and perform optimized inference while ensuring data privacy and compliance.

Users can switch freely between flexibility and cost according to their needs, or even use a hybrid of the two approaches, making MaiAgent the unified inference and service management layer.

cloud API inference services

Self-hosted GPU

MaiAgent platform is compatible with

AWS Bedrock Google Vertex AI Azure AI Oracle OCI

HPE Advantech Cisco Dell

Model capabilities

Closed-source models: high Open-source models: same as self-hosted GPU

Depends on the release of open-source models

Speed

Faster

Claude 4 Sonnet: 80 token/s

Gemini 2.5 Pro: 156 token/s

Medium

Llama3.3 70B: 25.01 token/s (example H100)

Upfront costs

Token API fees (pay-as-you-go)

Machine costs

Data center costs

Machine and model maintenance personnel costs

Machine depreciation

Concurrent users

Depends on cloud service provider support

Depends on number of GPUs (currently one H100 GPU supports about 25 people)

Data security

Cloud service providers that commit not to use data for training (AWS, GCP, Azure, Oracle)

Highly confidential, most secure

Personal data issues

Use a DLP server or service to remove personal data

None

Platform deployment environments

To meet different customers' needs during development, testing, and production rollout, our software platform provides multiple layered environments. The setup of these environments ensures that every process from development to production deployment can be properly verified and controlled.

Environment architecture

Our platform is built following industry best practices and flexibly offers the following environments according to customer needs:

Environment name

Notes

Primary purpose

Characteristics

PROD

Internal

Production environment

Officially serves external users, uses real data, and requires high stability and security.

UAT

Internal

User acceptance testing

Provided for clients and business units to perform acceptance testing to confirm system functionality meets requirements; environment is similar to production.

SIT

Internal, because it may need to integrate with internal systems

System integration testing

Verifies integration and compatibility between different modules and services, using test data close to real-world scenarios.

DEV

Accelerate development of value-added features externally

Development and testing environment

Provided for developers to perform coding and unit testing, uses simulated data, updates frequently, and tolerates errors.

Environment combinations

Different customers can choose suitable environment combinations according to project needs, for example:

  • PROD only: Suitable for small projects or simple production-only requirements, directly deployed in the production environment.

  • PROD + UAT: Suitable for projects that require acceptance testing to ensure functionality meets requirements before going live.

  • Full environment combination (DEV + SIT + UAT + PROD): Suitable for large or complex projects that require complete development, integration testing, and acceptance workflows.

We will flexibly configure according to customer needs to ensure the best balance between cost and quality.

Last updated

Was this helpful?