Gateway gives you a single place to manage AI costs across every provider. Set budgets at the project level, track spend in a unified dashboard, and use intelligent routing and context compression to automatically reduce costs — without changing application code.
Gateway aggregates spend from all providers into one dashboard. At a glance you can see:
Use Projects to segment spend by team, product, or environment. Use tags for finer-grained tracking within a project.
Budgets are configured per project. Each budget has three components:
Configurable percentages (default 50%, 80%, 90%). Each triggers a notification when crossed.
Gateway sums daily spend for the current budget period and compares it to the budget amount. When a hard limit is reached, all requests to that project return HTTP 402 until the next period starts or the budget is increased. Soft limits send alerts but never block requests.
The “By Project” tab shows color-coded progress bars for each project: blue (< 80%), yellow (80–100%), red (100%+). Spend is displayed as $X.XX / $Y.YY.
Routing directs traffic to lower-cost providers and models without compromising quality. The savings depend on the strategy:
Routing policies can be set at the org level or overridden per project.
Compression reduces tokens sent to providers, directly lowering per-request costs. Gateway applies two techniques:
Two modes are available for cost savings:
Combine Cost Optimization compression with Cost Optimized routing for maximum savings. The two features are independent and stack. See Context Compression for configuration.
Guardrails above project budgets protect your organization:
Org-level controls are managed from organization billing settings, separate from project-level budgets.
Spend is tracked per request based on provider per-token pricing. Managed and BYOK spend are tracked separately. Totals are updated atomically after each request completes.
No. Budgets are per-project. Create a project first, then configure its budget.
All requests to that project return HTTP 402 until the next budget period starts or the budget amount is increased.
They are independent. Compression reduces the number of tokens in a request; routing picks which provider handles it. Using both compounds the savings.
The spend dashboard shows actual spend. Compare to single-provider list pricing to estimate savings. Intelligent routing strategies include expected savings estimates in the dashboard.