How to Monitor Token Spend Across a Team

TL;DR

Three layers of token-spend monitoring:

Provider dashboards for org-wide totals (OpenAI, Anthropic, Google AI Studio).
Your own per-tenant tracking in a database, recorded at every model call.
Weekly review with engineering leads to investigate anomalies and reset budgets.

Layer 1: provider dashboards

Every major LLM provider has a usage dashboard. Tag your API keys by environment (dev / staging / prod) and by team to make this layer useful. A single org-wide chart is too coarse.

For OpenAI: separate keys per project, set per-key dollar caps, and review the Usage page weekly. Set monthly hard limits per key as a safety net.

Layer 2: your own per-tenant tracking

The provider dashboard tells you what you are spending. It doesn’t tell you which of your customers, internal teams, or workflows are doing it. For that, instrument your own code.

At every model call, log:

tenant_id
agent_id / workflow_id
model
input_tokens, output_tokens
cost_usd (compute it, don’t trust your provider’s eventual invoice)
started_at, latency_ms

Store in a wide event table (Postgres, BigQuery, ClickHouse). Roll up to per-tenant per-day in a materialized view.

Layer 3: weekly review

Once a week, walk through:

Top 5 tenants by spend, week-over-week change
Top 5 workflows by spend
Any tenant or workflow with > 3× their last-week median (likely a regression or a runaway loop)
Cost per successful task — the most useful metric you can publish

Bring engineering leads. Surprises in this meeting catch problems before they show up on the invoice.

Anomaly alerts

Page on:

Hourly spend > 5× the rolling 7-day median
Single-workflow run cost > $X (where X depends on your business)
Per-tenant hourly spend > tenant’s contracted budget

Alert on Slack or PagerDuty depending on severity. The signal is bursty by nature; tune false-positive rate down or it gets ignored.

What “good” looks like

Cost per successful task is trending down week over week — usually because models get cheaper, prompts get shorter, or you’re routing more traffic to smaller models.
99th-percentile workflow cost is bounded.
No tenant ever costs more than they pay you.

April 24, 2026 Musketeers Tech Musketeers Tech

← Back