One endpoint. Every model.
The AI Gateway
built for builders.
Ship an LLM gateway — or a full AI API — with a built-in RAG knowledge base, content moderation, an LLM router with multi-provider fallback, and per-consumer rate limiting. Your users get an OpenAI-compatible LLM proxy endpoint. Configure it all from the admin — no code required.
Your AI API gateway pipeline. Toggle each stage.
Every stage is optional except LLM. Enable what you need — skip what you don't. This is what sets an AI gateway apart from a plain LLM proxy.
Moderation
AI-powered content safety check on every request before it reaches the LLM.
RAG
Retrieve relevant context from your knowledge base and inject it into the prompt.
Pre-Hook
Run custom logic before the LLM call — re-rank chunks, add user context, enforce rules.
LLM (required)
Call any LLM provider with automatic fallback chains and streaming.
Post-Hook
Run custom logic after streaming begins — log, transform metadata, trigger side effects.
An LLM gateway with a built-in RAG knowledge base.
Upload documents, pick an embedding model, and ground every response in your own content. No external vector database to wire up.
384-d · fastest
768-d · balanced
1024-d · most accurate
Upload your docs
PDF, txt, md, json, csv, html, xml, yml, toml, and more — up to 5 MB each. Auto-chunked and indexed with live per-document status.
Live RAG test chat
Query your live vector index and see the exact chunk matches with similarity scores — so you know precisely what context the LLM will receive.
Tune retrieval
Top-k and score-threshold are configurable per pipeline stage, so you control how much context gets injected into each prompt.
LLM router + OpenAI-compatible LLM proxy.
Consumers call your AI gateway's OpenAI-compatible LLM proxy endpoint — you control what runs behind it. Configure a chain of providers: if your primary fails, the next one picks up automatically, per status code. Or run Cloudflare Workers AI — 50+ edge-native models, no API key required.
Primary: Claude Sonnet
Fallback 1: GPT-4o
on 429, 503
Fallback 2: Gemini Flash
on any error
Fallback 3: Cloudflare Workers AI
zero-cost last resort — no API key
Per-status-code routing
Route 429 (rate limit) to one provider, 503 (outage) to another. Fine-grained control.
Workers AI or BYOK
Use Cloudflare Workers AI with zero keys, or bring your own keys (BYOK) for OpenAI, Anthropic, Gemini, and Groq — stored as encrypted gateway secrets.
OpenAI-compatible proxy
Consumers send OpenAI format to your LLM proxy endpoint. You run Claude, Gemini, Groq, or Workers AI behind it.
Define plans and rate limits per consumer.
Four billing models with full parameter control — plus one-click templates to deploy a configured gateway in seconds.
Free
Token-limited, no charge. The frictionless on-ramp for new consumers.
Flat rate
A fixed monthly price for a defined token allowance.
Metered
A base price plus a configurable overage per 1k tokens.
Tiered
A token allowance, then per-token overage once it is used up.
Trial days
Offer a free trial window before a plan starts charging.
RPM limit
Cap requests per minute, per plan — not just globally.
TPM limit
Cap tokens per minute, per plan, to protect upstream cost.
Hard limits
Set a token allowance and a hard cutoff to stop runaway usage.
Deploy from a template in seconds
Start from a library of one-click templates — including RAG-powered and function-backed presets with suggested plans and pipeline configs built in. Ship a free public endpoint first, then add a key-gated plan with usage limits when you are ready.
Beyond routing — function-backed gateway APIs.
Not every AI API fits a standard LLM pipeline. Wire a gateway API directly to a deployed Cloudflare Worker for fully custom business logic — the AI API gateway becomes a configurable proxy to your own code.
Cloudflare Worker backend
Dispatch every gateway request to your deployed edge function. Receive the full request body, headers, and consumer metadata. Return any response shape you need.
Embeddable chat widget
Drop the hosted chat widget onto any site — it connects to your gateway API and inherits all your pipeline stages: RAG, moderation, rate limits, and billing.
One-click templates
Start from a function-backed gateway template with billing plans and pipeline config pre-wired. Go from zero to a live, custom AI API in minutes.
RAG works at three layers — not just the gateway
The gateway's built-in RAG knowledge base is one entry point. Aerostack also supports RAG in
bot freestyle mode via the enable_rag flag —
so every conversation is grounded in your docs automatically — and as a
knowledge_retrieval workflow node you can
wire anywhere in a multi-step agent graph. One knowledge base, three integration points.
Your API. Their keys.
Each consumer gets a unique API key. You control access, track usage, and bill per token — all automatic.
# Your consumer calls your API — not OpenAI's
curl -X POST https://gateway.aerostack.dev/my-api/v1/chat/completions \
-H "Authorization: Bearer ask_live_7f3a9c2e4b1d..." \
-d '{"messages":[{"role":"user","content":"Summarize Q4 revenue"}],"stream":true}'
# OpenAI-compatible response — any SDK works
# RAG context, moderation, and billing all happen transparently One key per consumer
Issue API keys with ask_live_ prefix. SHA-256 hashed — raw key shown once. Revoke or regenerate anytime.
Token wallet billing
Each consumer has a token balance. Every request deducts tokens used. Set hard limits to prevent overspend.
OpenAI-compatible endpoint
Your consumers use the same /v1/chat/completions format they already know. Any OpenAI SDK works out of the box.
BYO-JWT — bring your own auth
Already have an auth system? Validate your own JWTs against your JWKS endpoint. No migration needed.
What you can build.
AI Customer Support API
RAG pipeline answers from your docs. Moderation catches toxic inputs. Fallback switches providers if one goes down.
Knowledge Base Query API
Embed your docs, connect vector search, and expose an OpenAI-compatible endpoint. Consumers search with natural language.
Moderated Content Generation
Pre-flight moderation blocks unsafe prompts. Post-flight moderation filters unsafe responses. All before your user sees them.
OpenAI-compatible LLM proxy
Drop-in LLM proxy: consumers send the OpenAI format, you route to the cheapest provider that meets your latency SLA. Auto-fallback: OpenAI → Anthropic → Gemini → Groq.
Function-backed custom API
Back a gateway with a deployed Cloudflare Worker function for fully custom business logic, or drop the embeddable chat widget onto any site.
Not another API gateway.
Traditional gateways route traffic. This one adds intelligence.
| Feature | Kong / AWS API GW | OpenRouter | Aerostack |
|---|---|---|---|
| Built-in RAG pipeline | — | — | ✓ |
| Content moderation stage | — | — | ✓ |
| Pre/post processing hooks | ~ | — | ✓ |
| Multi-provider fallback chains | — | ✓ | ✓ |
| Per-consumer plans & token limits | — | — | ✓ |
| Consumer API key provisioning | ✓ | — | ✓ |
| BYO-JWT (your own auth) | ✓ | — | ✓ |
| Edge-deployed (300+ locations) | ~ | — | ✓ |
| OpenAI-compatible endpoint | — | ✓ | ✓ |
| One-click deploy templates | — | — | ✓ |
Frequently asked questions
What is an LLM gateway, and why do I need one?
What is the difference between an LLM proxy and an LLM gateway?
How does the built-in LLM router and automatic fallback work?
Which LLM providers does the Aerostack LLM gateway support?
Does the LLM gateway include a built-in RAG knowledge base?
What billing plans and rate limits can I configure per gateway?
Can I deploy a gateway from a template, and what is the Free vs. Gateway deploy path?
How do I authenticate consumers, and what observability does the LLM gateway provide?
What is an AI gateway, and how does it differ from an AI API gateway?
What is a function-backed gateway API, and when would I use one?
Launch your AI gateway.
Configured, not coded.
LLM proxy endpoint. RAG knowledge base. Moderation. LLM router. Plans & limits. All configured from the admin.