Question 1

What is an LLM gateway, and why do I need one?

Accepted Answer

An LLM gateway is a managed layer that sits between your application and one or more LLM providers, giving you a single endpoint to handle routing, rate limiting, RAG retrieval, content moderation, and per-consumer usage tracking — without scattering that logic across every client. Without a gateway, provider error handling lives in application code, rate-limiting is rebuilt per project, and there is no central place to swap models or add billing. The Aerostack LLM gateway is part of the platform, so the same workspace that hosts your MCP servers, bots, and workflows also holds the gateway. Routing rules, BYOK secrets, and RAG context travel with the workspace rather than a separate config plane. It runs on Cloudflare Workers, is available now, and requires no code to configure.

Question 2

What is the difference between an LLM proxy and an LLM gateway?

Accepted Answer

An LLM proxy forwards requests to a model provider and normalises the API shape so different providers look the same to your code, but it adds no logic of its own. An LLM gateway wraps that proxy in a full pipeline: before the model call you can run RAG retrieval to inject knowledge-base context, apply a content moderation check to flag or block unsafe input, and enforce per-consumer rate limits; after the response you log token count, latency, and stream type. Aerostack exposes an OpenAI-compatible LLM proxy endpoint so existing SDK clients work without code changes, then layers routing, RAG, billing plans, and consumer keys on top of that same endpoint.

Question 3

How does the built-in LLM router and automatic fallback work?

Accepted Answer

The LLM router lets you configure an ordered provider chain for each gateway API. Requests go to the first provider; if it returns a retriable error such as a 429 or 503, the router automatically retries the next provider in the chain without the consumer seeing a failure. The fallback is transparent to callers and is recorded in the request log alongside the original attempt. You can chain as many providers as needed — for example, GPT-4o primary, Claude secondary, and Cloudflare Workers AI as a zero-cost last resort. Because the routing config lives in the gateway rather than in application code, you reorder or swap providers without a deployment. This makes the Aerostack LLM router both a reliability layer and a cost-control tool.

Question 4

Which LLM providers does the Aerostack LLM gateway support?

Accepted Answer

The LLM gateway supports OpenAI, Anthropic Claude, Google Gemini, and Groq via bring-your-own-key (BYOK). Keys are stored as encrypted gateway-scoped secrets, decrypted at the edge only at request time and never exposed to consumers. Cloudflare Workers AI is also available with no API key required: the model browser shows the full catalogue — text, code, image, embedding, and speech models — with cost tier and context window for each. You can build an entirely key-free LLM proxy using Workers AI, or place it as a fallback behind commercial providers in the router chain. Gateway APIs can also be function-backed, wired to a deployed Cloudflare Worker for custom handling beyond standard LLM routing.

Question 5

Does the LLM gateway include a built-in RAG knowledge base?

Accepted Answer

Yes. Each gateway API can have its own RAG knowledge base. You upload documents (PDF, txt, md, json, csv, html, xml — up to 5 MB each) and the gateway auto-chunks and indexes them with live per-document status: indexing, ready, or error. Before going live, a built-in vector test chat lets you query the index and see the exact chunks that would be injected along with similarity scores, so you can verify retrieval quality before any consumer calls the endpoint. At setup you choose one of three embedding models: BGE Small (384-d, fastest), BGE Base (768-d, balanced), or BGE Large (1024-d, most accurate). The model locks after the first upload to keep the vector space consistent. Top-k and score-threshold are configurable per pipeline stage. No external vector database is required.

Question 6

What billing plans and rate limits can I configure per gateway?

Accepted Answer

Each gateway supports four billing models. Free is token-limited at no charge, useful for public access or developer trials. Flat rate charges a fixed monthly price. Metered charges a base price plus a configurable overage per 1k tokens once the allowance is consumed. Tiered provides a fixed token allowance then switches to per-token overage. Every plan — including free — carries its own RPM (requests per minute) and TPM (tokens per minute) limits, so a free tier can be stricter than a paid one on the same gateway. Each plan also supports trial days and a hard token cap. This is what makes the LLM gateway pay for itself: metered and tiered plans capture revenue proportional to model usage with no custom billing code.

Question 7

Can I deploy a gateway from a template, and what is the Free vs. Gateway deploy path?

Accepted Answer

Yes. A template library provides one-click gateway presets including RAG-powered knowledge APIs, function-backed gateways, and hosted AI APIs — each with suggested billing plans and a pre-built pipeline so you skip the blank-config step. When deploying from a template you choose one of two paths. The free public URL path creates an open endpoint with no consumer key required, suitable for public tools or internal use. The gateway path creates a key-gated endpoint with full consumer management, billing plans, and analytics. You can start on the free path to get a working LLM proxy endpoint immediately and add the key-gated billing layer later without changing the URL. This is the fastest way to go from zero to a live gateway.

Question 8

How do I authenticate consumers, and what observability does the LLM gateway provide?

Accepted Answer

Authentication supports three modes: gateway-issued keys (ask_live_ prefix, SHA-256 hashed, shown once), BYO-JWT (JWKS URL plus a custom user-id claim, validated at the edge against your own auth system), and IP allowlists or blocklists enforced before any auth check. Content moderation runs in block mode (request rejected) or flag mode (logged and passed through). For observability, the request log stores the last 200 requests with live-tail polling: each row shows consumer ID, model and provider, tokens consumed, stream type (SSE or WebSocket), and latency. Filters cover consumer, model, provider, and a slow-only view for requests over two seconds. Derived stats include total tokens, average and P95 latency. Analytics breaks usage by consumer and by model-provider over 24h, 7d, and 30d.

Question 9

What is an AI gateway, and how does it differ from an AI API gateway?

Accepted Answer

An AI gateway is the AI-specific evolution of a traditional API gateway. Where a conventional AI API gateway handles routing and authentication, an AI gateway adds LLM-native intelligence: RAG retrieval to inject knowledge-base context into every prompt, content moderation to block or flag unsafe inputs, provider-aware fallback chains that understand LLM-specific errors like rate limits (429) and model overloads (503), and per-consumer billing tied to token consumption rather than raw request count. The Aerostack AI gateway is deployed at the edge on Cloudflare Workers (300+ locations) so every request is handled close to the consumer. It exposes an OpenAI-compatible endpoint, so existing SDK clients require no code changes to point at it. You configure the entire pipeline — RAG, moderation, routing, billing — from the admin without writing or deploying any gateway code.

Question 10

What is a function-backed gateway API, and when would I use one?

Accepted Answer

A function-backed gateway API replaces the standard LLM pipeline stage with a deployed Cloudflare Worker — your own edge function — as the backend handler. Instead of routing the request to OpenAI, Claude, or another model provider, the gateway dispatches it to your Worker, which receives the full request body, consumer metadata, and any pipeline context, then returns any response shape it wants. This is the right choice when your AI API needs custom business logic that does not fit neatly into the standard moderation → RAG → LLM chain: for example, aggregating multiple model calls, applying proprietary scoring, calling internal services, or returning structured data rather than a chat stream. Function-backed APIs still benefit from all gateway features — consumer key auth, BYO-JWT, rate limiting, billing plans, and request logging — so you get the full AI API gateway infrastructure without giving up control of the core logic. The template library includes function-backed presets to get started in minutes.

Feature	Kong / AWS API GW	OpenRouter	Aerostack
Built-in RAG pipeline	—	—	✓
Content moderation stage	—	—	✓
Pre/post processing hooks	~	—	✓
Multi-provider fallback chains	—	✓	✓
Per-consumer plans & token limits	—	—	✓
Consumer API key provisioning	✓	—	✓
BYO-JWT (your own auth)	✓	—	✓
Edge-deployed (300+ locations)	~	—	✓
OpenAI-compatible endpoint	—	✓	✓
One-click deploy templates	—	—	✓

The AI Gateway built for builders.

Your AI API gateway pipeline. Toggle each stage.

Moderation

RAG

Pre-Hook

LLM (required)

Post-Hook

An LLM gateway with a built-in RAG knowledge base.

Upload your docs

Live RAG test chat

Tune retrieval

LLM router + OpenAI-compatible LLM proxy.

Per-status-code routing

Workers AI or BYOK

OpenAI-compatible proxy

Define plans and rate limits per consumer.

Free

Flat rate

Metered

Tiered

Trial days

RPM limit

TPM limit

Hard limits

Deploy from a template in seconds

Beyond routing — function-backed gateway APIs.

Cloudflare Worker backend

Embeddable chat widget

One-click templates

RAG works at three layers — not just the gateway

Your API. Their keys.

One key per consumer

Token wallet billing

OpenAI-compatible endpoint

BYO-JWT — bring your own auth

What you can build.

AI Customer Support API

Knowledge Base Query API

Moderated Content Generation

OpenAI-compatible LLM proxy

Function-backed custom API

Not another API gateway.

Frequently asked questions

Launch your AI gateway. Configured, not coded.

The AI Gateway
built for builders.

Launch your AI gateway.
Configured, not coded.