VeloxAI
Back to Blog
Operations· 10 min read

The AI billing pipeline: from token to invoice

Production AI billing needs usage events, idempotent payments, credit accounting, per-model cost breakdowns, and proactive balance alerts.

VeloxAI Engineering
VeloxAI EngineeringVeloxAI Engineering Team
#billing#analytics#usage
AI cost control
AI cost control

AI billing is fundamentally different from SaaS billing. In SaaS you charge per seat per month. In AI, every request has a different cost — different model, different token count, different provider pricing, different cache hit rates. Your billing pipeline must capture this granularity or you will either undercharge (losing money) or overcharge (losing customers).

Usage events are source of truth

// Emit after every API response — never write synchronously
interface UsageEvent {
  requestId: string;  orgId: string;  apiKeyId: string;
  model: string;  provider: string;
  inputTokens: number;  outputTokens: number;
  cacheReadTokens: number;  latencyMs: number;
  status: 'success' | 'error';
  costUsd: number;  creditCost: number;
}

// Process in queue worker:
await Promise.all([
  deductCredits(event.orgId, event.creditCost),
  writeAnalyticsEvent(event),
  checkAlertThresholds(event.orgId)
]);
Queue-backed usage event processing

Idempotent credit deduction

Use the request ID as the idempotency key. If a usage event is processed twice (queue retry, worker crash), the same credit must not be deducted twice. Payment callbacks must also be idempotent — verify with the provider before updating balances.

Analytics as a query layer

Users must filter spend by date range, model, provider, API key, and status. Show latency percentiles (p50, p95, p99), error rates, and cache hit rates. Operators must drill from a monthly spike to the exact request that caused it.

Updated:

Ready to ship your AI product?

Start free, route across providers, and see honest cost + readiness from day one.