Routing Config
Routing behavior is driven by gateway/src/gateway/config.json locally and by AWS
Systems Manager Parameter Store in deployed environments.
Aliases
Section titled “Aliases”Aliases give provider/model pairs stable names that routing rules can reference:
{ "aliases": { "Claude Haiku 4-5 Bedrock": { "provider": "bedrock", "model": "us.anthropic.claude-haiku-4-5-20251001-v1:0" } }}Default Model
Section titled “Default Model”The gateway uses default_model when no routing rule matches the request metadata:
{ "default_model": "Claude Haiku 4-5 Bedrock"}Routing Rules
Section titled “Routing Rules”Rules match on the JSON object passed in the metadata header. Targets are weighted:
{ "routing_rules": [ { "id": "1", "name": "Code generation", "match": { "name": "task-type", "value": "code_generation" }, "targets": [ { "alias": "Claude Sonnet 4-5 Bedrock", "weight": 6 }, { "alias": "Nova Pro Bedrock", "weight": 4 } ] } ]}If a client sends:
metadata: {"task-type":"code_generation"}the gateway chooses between the configured targets according to their weights.
Retries and Fallbacks
Section titled “Retries and Fallbacks”target_retries controls retries for a selected target. fallbacks define aliases to try
after the rule/default target chain is exhausted:
{ "target_retries": 2, "fallbacks": ["Claude Haiku 4-5 Bedrock Fallback"]}Timeouts and Cooldowns
Section titled “Timeouts and Cooldowns”initial_response_timeout caps time to first response. stream_idle_timeout caps idle
time between streamed chunks. cooldown_ttl temporarily removes a provider/model pair
from routing after rate-limit failures:
{ "initial_response_timeout": 30, "stream_idle_timeout": 5, "cooldown_ttl": 60}Prompt Cache
Section titled “Prompt Cache”Prompt cache settings control exact and semantic response caching:
{ "prompt_cache": { "ttl_seconds": 3600, "temperature_threshold": 0.3, "semantic": { "similarity_threshold": 0.8, "top_k": 3, "conversation_size_threshold": 3 } }}Streaming requests bypass the cache. Non-streaming requests can use exact matching and, when configured, semantic matching.