TL;DR
Three layers of token-spend monitoring:
- Provider dashboards for org-wide totals (OpenAI, Anthropic, Google AI Studio).
- Your own per-tenant tracking in a database, recorded at every model call.
- Weekly review with engineering leads to investigate anomalies and reset budgets.
Layer 1: provider dashboards
Every major LLM provider has a usage dashboard. Tag your API keys by environment (dev / staging / prod) and by team to make this layer useful. A single org-wide chart is too coarse.
For OpenAI: separate keys per project, set per-key dollar caps, and review the Usage page weekly. Set monthly hard limits per key as a safety net.
Layer 2: your own per-tenant tracking
The provider dashboard tells you what you are spending. It doesn’t tell you which of your customers, internal teams, or workflows are doing it. For that, instrument your own code.
At every model call, log:
tenant_idagent_id/workflow_idmodelinput_tokens,output_tokenscost_usd(compute it, don’t trust your provider’s eventual invoice)started_at,latency_ms
Store in a wide event table (Postgres, BigQuery, ClickHouse). Roll up to per-tenant per-day in a materialized view.
Layer 3: weekly review
Once a week, walk through:
- Top 5 tenants by spend, week-over-week change
- Top 5 workflows by spend
- Any tenant or workflow with > 3× their last-week median (likely a regression or a runaway loop)
- Cost per successful task — the most useful metric you can publish
Bring engineering leads. Surprises in this meeting catch problems before they show up on the invoice.
Anomaly alerts
Page on:
- Hourly spend > 5× the rolling 7-day median
- Single-workflow run cost > $X (where X depends on your business)
- Per-tenant hourly spend > tenant’s contracted budget
Alert on Slack or PagerDuty depending on severity. The signal is bursty by nature; tune false-positive rate down or it gets ignored.
What “good” looks like
- Cost per successful task is trending down week over week — usually because models get cheaper, prompts get shorter, or you’re routing more traffic to smaller models.
- 99th-percentile workflow cost is bounded.
- No tenant ever costs more than they pay you.