Mantis

A self-hosted LLM gateway for routing, caching, guardrails, and observability across model providers.

One API Stable chat completions endpoint in front of multiple model targets.

AWS-native Deployable with Terraform, ECS, ElastiCache, Bedrock, and CloudWatch.

Policy driven Routing, retry, fallback, timeout, cooldown, and cache behavior live in config.

What Mantis Provides

Route requests by metadata, model aliases, weighted targets, and fallback chains.

Coordinate validation, cache checks, cooldowns, provider calls, retries, and terminal responses.

Send chat completion requests through a single gateway endpoint with optional routing metadata.

Call Mantis from application code without manually constructing every HTTP request.

Start with the quick start to run or deploy the gateway, then read the case study for the project background and design decisions.