Routing policies

Automatically route AI requests across providers to optimize cost, performance, and reliability.

Routing policies

Automatically route AI requests across providers to optimize cost, performance, and reliability.

Routing policies control which AI provider and model handles each request. Instead of hardcoding a single provider, policies automatically select the best option based on your optimization goals — whether that’s minimizing cost, maximizing uptime, or balancing both with ML-powered routing.

Strategy comparison

Strategy	API Config	Description	Detail Page
Single	`type: "fallback"` (1 provider)	Always route to one provider	Single Provider
Priority	`type: "fallback"`	Try providers in priority order with automatic failover	Priority
Least Latency	Dashboard only	Route to the fastest provider	Performance
Lowest Cost	Dashboard only	Route to the cheapest provider	Performance
Cost Optimized	`type: "intelligent", axis: "cost"`	ML-based routing — ~70% traffic to cheaper models	Intelligent
Balanced	`type: "intelligent", axis: "performance"`	ML-based routing — even cost/quality split	Intelligent
Quality First	`type: "intelligent", axis: "intelligence"`	ML-based routing — ~70% traffic to capable models	Intelligent

Choosing the right strategy

Your Priority	Recommended Strategy
Simplicity / dev environment	`single`
High availability / failover	`priority`
Fastest response time	`least_latency`
Lowest cost (same model)	`lowest_cost`
Lowest cost (mixed models)	`cost_optimized`
General production optimization	`balanced`
Maximum output quality	`quality_first`

Tag-based routing

You can attach tags to requests — like user tier, region, or environment — and use them to route to different policies. Rules are evaluated in priority order: the first matching rule applies, and unmatched requests fall through to the default strategy.

Conditions support AND/OR logic and operators like eq, gt, in, contains, starts_with, and exists. Configure tag-based routing through the dashboard or the API.

FAQs

Does intelligent routing add latency?

The complexity scoring step adds ~1–4ms. Negligible compared to LLM inference time.

How accurate is the complexity scoring?

Clean separation below 0.4 (simple) and above 0.6 (complex). Edge cases around 0.5 route conservatively to more capable models.

What happens if the complexity scorer fails?

Gateway falls back to the most capable model in your policy. Quality is never compromised by a scorer failure.

Can I use any model with intelligent routing?

Yes. New models work immediately — capabilities are inferred from pricing data.

What if I only want specific models, not auto-selection?

The router only selects from models in your policy, never outside of it.

What happens if all providers fail?

Gateway returns an error after all failover attempts are exhausted. Provider health is tracked automatically so requests skip providers that are currently down.

Can I have multiple default policies?

No. One org-level default. You can have one additional policy per project for project-scoped routing.