Cost governance and savings

Control AI spend with budgets, unified billing, and automatic cost optimization.

Cost governance and savings

Control AI spend with budgets, unified billing, and automatic cost optimization.

Gateway gives you a single place to manage AI costs across every provider. Set budgets at the project level, track spend in a unified dashboard, and use intelligent routing and context compression to automatically reduce costs — without changing application code.

Unified billing

Gateway aggregates spend from all providers into one dashboard. At a glance you can see:

Total spend across all providers and models
Managed spend vs BYOK spend (bring-your-own-key) tracked separately
API call count with breakdowns
Breakdowns by model, by project, and by tag via dashboard tabs

Use Projects to segment spend by team, product, or environment. Use tags for finer-grained tracking within a project.

Project budgets

Budgets are configured per project. Each budget has three components:

Amount — dollar limit in USD
Period — daily, weekly, monthly, quarterly, or yearly
Enforcement mode — what happens at the limit

Enforcement Mode	At threshold	At limit	Best for
Soft Limit	Sends alert	Requests continue	Visibility without blocking
Hard Limit	Sends alert	Blocks requests (HTTP 402)	Strict cost caps

Alert thresholds

Configurable percentages (default 50%, 80%, 90%). Each triggers a notification when crossed.

How budget enforcement works

Gateway sums daily spend for the current budget period and compares it to the budget amount. When a hard limit is reached, all requests to that project return HTTP 402 until the next period starts or the budget is increased. Soft limits send alerts but never block requests.

Budget progress in the dashboard

The “By Project” tab shows color-coded progress bars for each project: blue (< 80%), yellow (80–100%), red (100%+). Spend is displayed as $X.XX / $Y.YY.

How routing policies save money

Routing directs traffic to lower-cost providers and models without compromising quality. The savings depend on the strategy:

Cost Optimized (Intelligent) — ML complexity scoring routes ~70% of traffic to cheaper models. Expected savings: 40–60%. See Intelligent Routing for details.
Balanced (Intelligent) — even cost/quality split. Expected savings: 20–35%. See Intelligent Routing for details.
Lowest Cost (Performance) — always picks the cheapest provider for the requested model. See Performance for details.
Tag-based routing — route different request types to different cost tiers. For example, internal requests → cost-optimized policy, customer-facing requests → quality-first policy. See Routing Policies for configuration.

Routing policies can be set at the org level or overridden per project.

How context compression saves money

Compression reduces tokens sent to providers, directly lowering per-request costs. Gateway applies two techniques:

Lossless compression — minifies JSON in tool schemas, arguments, and results. Achieves 30–60% reduction on JSON-heavy requests with zero quality impact.
Message trimming — removes middle messages from long conversations, preserving system messages and the most recent messages.

Two modes are available for cost savings:

Cost Optimization — proactively compresses at a target ratio (default 70%). Reduces costs even when the request fits within the context window.
Context Window Only — compresses only to prevent context window errors. Acts as a safety net, not proactive savings.

Combine Cost Optimization compression with Cost Optimized routing for maximum savings. The two features are independent and stack. See Context Compression for configuration.

Org-level spend controls

Guardrails above project budgets protect your organization:

Free tier default — $15 spend cap for new organizations
Spend caps — configurable org-level caps for Pro/Enterprise tiers
Velocity alerts — detects unusual spend spikes (10x daily average)
Payment failure handling — dunning process with notifications and eventual access restriction

Org-level controls are managed from organization billing settings, separate from project-level budgets.

FAQ

How is spend calculated?

Spend is tracked per request based on provider per-token pricing. Managed and BYOK spend are tracked separately. Totals are updated atomically after each request completes.

Can I set a budget without a project?

No. Budgets are per-project. Create a project first, then configure its budget.

What happens when a hard limit is reached?

All requests to that project return HTTP 402 until the next budget period starts or the budget amount is increased.

Do routing policies and compression interact?

They are independent. Compression reduces the number of tokens in a request; routing picks which provider handles it. Using both compounds the savings.

How do I see savings from routing or compression?

The spend dashboard shows actual spend. Compare to single-provider list pricing to estimate savings. Intelligent routing strategies include expected savings estimates in the dashboard.

Next steps

Projects

Organize workloads, set budgets, and control routing at the project level.

Routing policies

Configure intelligent routing, failover, and cost optimization across providers.

Context compression

Automatically reduce token usage and avoid context window limits.

Unified billing

Gateway aggregates spend from all providers into one dashboard. At a glance you can see:

Total spend across all providers and models
Managed spend vs BYOK spend (bring-your-own-key) tracked separately
API call count with breakdowns
Breakdowns by model, by project, and by tag via dashboard tabs

Use Projects to segment spend by team, product, or environment. Use tags for finer-grained tracking within a project.

Project budgets

Budgets are configured per project. Each budget has three components:

Amount — dollar limit in USD
Period — daily, weekly, monthly, quarterly, or yearly
Enforcement mode — what happens at the limit

Enforcement Mode	At threshold	At limit	Best for
Soft Limit	Sends alert	Requests continue	Visibility without blocking
Hard Limit	Sends alert	Blocks requests (HTTP 402)	Strict cost caps

Alert thresholds

Configurable percentages (default 50%, 80%, 90%). Each triggers a notification when crossed.

How budget enforcement works

Budget progress in the dashboard

The “By Project” tab shows color-coded progress bars for each project: blue (< 80%), yellow (80–100%), red (100%+). Spend is displayed as $X.XX / $Y.YY.

How routing policies save money

Routing directs traffic to lower-cost providers and models without compromising quality. The savings depend on the strategy:

Cost Optimized (Intelligent) — ML complexity scoring routes ~70% of traffic to cheaper models. Expected savings: 40–60%. See Intelligent Routing for details.
Balanced (Intelligent) — even cost/quality split. Expected savings: 20–35%. See Intelligent Routing for details.
Lowest Cost (Performance) — always picks the cheapest provider for the requested model. See Performance for details.
Tag-based routing — route different request types to different cost tiers. For example, internal requests → cost-optimized policy, customer-facing requests → quality-first policy. See Routing Policies for configuration.

Routing policies can be set at the org level or overridden per project.

How context compression saves money

Compression reduces tokens sent to providers, directly lowering per-request costs. Gateway applies two techniques:

Lossless compression — minifies JSON in tool schemas, arguments, and results. Achieves 30–60% reduction on JSON-heavy requests with zero quality impact.
Message trimming — removes middle messages from long conversations, preserving system messages and the most recent messages.

Two modes are available for cost savings:

Cost Optimization — proactively compresses at a target ratio (default 70%). Reduces costs even when the request fits within the context window.
Context Window Only — compresses only to prevent context window errors. Acts as a safety net, not proactive savings.

Combine Cost Optimization compression with Cost Optimized routing for maximum savings. The two features are independent and stack. See Context Compression for configuration.

Org-level spend controls

Guardrails above project budgets protect your organization:

Free tier default — $15 spend cap for new organizations
Spend caps — configurable org-level caps for Pro/Enterprise tiers
Velocity alerts — detects unusual spend spikes (10x daily average)
Payment failure handling — dunning process with notifications and eventual access restriction

Org-level controls are managed from organization billing settings, separate from project-level budgets.

FAQ

How is spend calculated?

Spend is tracked per request based on provider per-token pricing. Managed and BYOK spend are tracked separately. Totals are updated atomically after each request completes.

Can I set a budget without a project?

No. Budgets are per-project. Create a project first, then configure its budget.

What happens when a hard limit is reached?

All requests to that project return HTTP 402 until the next budget period starts or the budget amount is increased.

Do routing policies and compression interact?

They are independent. Compression reduces the number of tokens in a request; routing picks which provider handles it. Using both compounds the savings.

How do I see savings from routing or compression?

The spend dashboard shows actual spend. Compare to single-provider list pricing to estimate savings. Intelligent routing strategies include expected savings estimates in the dashboard.

Next steps

Projects

Organize workloads, set budgets, and control routing at the project level.

Routing policies

Configure intelligent routing, failover, and cost optimization across providers.

Context compression

Automatically reduce token usage and avoid context window limits.