Caicaini
Get started

Reference

Rate limits

Each key has a tier that controls requests per minute, input tokens per minute, and concurrent streams. Tiers are based on lifetime top-up; promotion is automatic.

Tiers

TierWhen you are hereRPMTPMConcurrent streams
freeNew keys before any top-up.520,0001
pay_as_you_goAfter your first confirmed top-up.60100,0005
high_volumeLifetime top-up of $500 or more.300500,00020
enterpriseCustom limits, granted manually.customcustomcustom

Promotion runs every 5 minutes and looks at confirmed lifetime top-up. There is no manual button — when your spend crosses a threshold, your tier moves up on the next sweep. Need higher than high_volume for a launch? Drop us a note via the support links in the footer.

Headers on every response

Every /v1/* response carries the live request-counter values for your key. Use them to throttle client-side instead of waiting for a 429.

  • X-RateLimit-Tier — the tier that produced these limits.
  • X-RateLimit-Limit-Requests — your RPM cap.
  • X-RateLimit-Remaining-Requests — requests left in the current minute.
  • X-RateLimit-Reset-Requests — seconds until the current minute window resets.
  • Retry-After — only set on 429. Seconds to sleep before retrying.
example
curl -i https://caicaini.com/v1/messages \
  -H "Authorization: Bearer cai_api_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"caicaini/auto","max_tokens":50,"messages":[{"role":"user","content":"Hi"}]}'

# X-RateLimit-Tier:               pay_as_you_go
# X-RateLimit-Limit-Requests:     60
# X-RateLimit-Remaining-Requests: 59
# X-RateLimit-Reset-Requests:     42         # seconds until the window resets

When you exceed

Over the limit returns 429 with type rate_limit_error and a Retry-After header in seconds. The request is rejected before any provider call, so it does not consume credits.

response · 429
{
  "type": "error",
  "error": {
    "type": "rate_limit_error",
    "message": "Requests per minute exceeded for this key. Retry after 12s."
  }
}

Recommended backoff

Wrap calls in a small retry helper. Read Retry-After when present, fall back to exponential backoff with jitter when not. Cap retries at 4–6 attempts and a total wait around 30 seconds.

# bash one-liner that retries on 429 with exponential backoff
delay=1
for i in 1 2 3 4 5; do
  status=$(curl -s -o /tmp/body -w "%{http_code}" \
    -X POST https://caicaini.com/v1/messages \
    -H "Authorization: Bearer cai_api_YOUR_KEY" \
    -H "Content-Type: application/json" \
    -d '{"model":"caicaini/auto","max_tokens":50,"messages":[{"role":"user","content":"Hi"}]}')
  if [ "$status" != "429" ]; then cat /tmp/body; exit 0; fi
  sleep "$delay"
  delay=$((delay * 2))
done
echo "rate-limited after retries" >&2
exit 1

Insufficient credits is a separate signal

Out-of-credits returns 402 with type insufficient_quota. Do not retry — it will not pass until your balance goes up. Catch the 402 explicitly, alert your operator, and pause the job. See Errors for the full table of error types.