TL;DR

Three layers of token-spend monitoring:

  1. Provider dashboards for org-wide totals (OpenAI, Anthropic, Google AI Studio).
  2. Your own per-tenant tracking in a database, recorded at every model call.
  3. Weekly review with engineering leads to investigate anomalies and reset budgets.

Layer 1: provider dashboards

Every major LLM provider has a usage dashboard. Tag your API keys by environment (dev / staging / prod) and by team to make this layer useful. A single org-wide chart is too coarse.

For OpenAI: separate keys per project, set per-key dollar caps, and review the Usage page weekly. Set monthly hard limits per key as a safety net.

Layer 2: your own per-tenant tracking

The provider dashboard tells you what you are spending. It doesn’t tell you which of your customers, internal teams, or workflows are doing it. For that, instrument your own code.

At every model call, log:

Store in a wide event table (Postgres, BigQuery, ClickHouse). Roll up to per-tenant per-day in a materialized view.

Layer 3: weekly review

Once a week, walk through:

Bring engineering leads. Surprises in this meeting catch problems before they show up on the invoice.

Anomaly alerts

Page on:

Alert on Slack or PagerDuty depending on severity. The signal is bursty by nature; tune false-positive rate down or it gets ignored.

What “good” looks like

April 24, 2026 Musketeers Tech Musketeers Tech
← Back