Reference

Pricing & credits

API spend is denominated in Caicaini credits, billed from a dedicated apiCredits pool. The same credit unit as the chat product, but a different price-per-credit and a strict separation between pools.

How credits work

Every model call costs credits. The exact figure is computed from the input and output tokens, plus a model multiplier.
Credits are never floating point. Everything is integer math; nothing rounds in your disfavor.
The usage.credits_consumed field on every response is the authoritative deduction. Whatever it says is what we charged.
The apiCredits pool is fully isolated from your chat-product credits. Subscription credits and top-ups for chat never leak into API spend, and vice versa.

Top-up packs

The cheapest way to buy credits is the four preset packs. The custom-amount slider on the same page interpolates between these tiers.

Pack	Price	Credits	Credits per $	vs starter
api_starter	$10	70,000	7,000	—
api_builder	$50	380,000	7,600	+8%
api_scale	$200	1,600,000	8,000	+14%
api_enterprise	$1,000	8,500,000	8,500	+21%

Custom amount

If a pack is the wrong size, type any dollar amount between $10 and $10,000 in the dashboard and we compute the credits at the same rate as the matching pack tier. The slider on /developers/billing previews the math live; the server is the source of truth.

Amount range	Credits per $	Equivalent pack
$10.00 – $49.99	7,000	api_starter rate
$50.00 – $199.99	7,600	api_builder rate
$200.00 – $999.99	8,000	api_scale rate
$1,000.00+	8,500	api_enterprise rate

Previewing the math

The dashboard hits a preview endpoint on the same server logic that fulfillment uses, so what you see is what you get.

# Preview what $50 buys without creating a payment.
curl https://caicaini.com/api/developers/billing/preview \
  -H "Authorization: Bearer YOUR_DASHBOARD_JWT" \
  -H "Content-Type: application/json" \
  -d '{"usdCents": 5000}'

# {
#   "credits": 380000,
#   "creditsPerDollar": 7600,
#   "markupX": 1.32,
#   "tier": "builder"
# }

Per-model cost

Different models have different per-1K-token credit rates. Faster, smaller models cost less per token; capability-heavy models cost more. Use caicaini/auto when you do not have a strong opinion — the router picks the cheapest model that can plausibly do the job.

caicaini/auto — billed as the resolved model. Auto rarely picks the most expensive option.
caicaini/opus — most expensive per token. Best capability. Costs roughly 5× caicaini/sonnet.
caicaini/sonnet — mid-tier. The default for production traffic.
caicaini/haiku — cheap and fast. Roughly 1/3 the cost of sonnet.
caicaini/kimi — cheapest per token at long context lengths. Big advantage for retrieval-heavy workloads.

Pre-flight reservation

Before each call, we estimate the worst-case cost (input plus full max_tokens at the model's output rate) and reserve that many credits from your apiCredits balance. After the call completes we release the reservation and deduct the actual cost. If your spendable balance cannot cover the worst case, the call returns 402 with type insufficient_quota before any provider work happens.

Estimating before you send

// Quick budgeting: if you know average input/output tokens per turn,
// estimate credit cost before sending. Replace the rates with the values
// you read from the model row in GET /v1/models if you want exact numbers.
type Rates = { in_per_1k_credits: number; out_per_1k_credits: number };

function estimateCredits(inputTokens: number, outputTokens: number, r: Rates): number {
  return Math.ceil((inputTokens / 1000) * r.in_per_1k_credits)
       + Math.ceil((outputTokens / 1000) * r.out_per_1k_credits);
}

// Then keep usage.credits_consumed from each response as the source of truth.

Refunds

Refunds are admin-initiated and rare. The most common case is a stream that dropped before any output reached you — those are refunded automatically. For anything else, contact support with the message id from the affected response. See Errors for what is refundable by default.

PreviousUsing HTTP clients

NextChangelog