Reference
Rate limits
Each key has a tier that controls requests per minute, input tokens per minute, and concurrent streams. Tiers are based on lifetime top-up; promotion is automatic.
Tiers
| Tier | When you are here | RPM | TPM | Concurrent streams |
|---|---|---|---|---|
| free | New keys before any top-up. | 5 | 20,000 | 1 |
| pay_as_you_go | After your first confirmed top-up. | 60 | 100,000 | 5 |
| high_volume | Lifetime top-up of $500 or more. | 300 | 500,000 | 20 |
| enterprise | Custom limits, granted manually. | custom | custom | custom |
Promotion runs every 5 minutes and looks at confirmed lifetime top-up. There is no manual button — when your spend crosses a threshold, your tier moves up on the next sweep. Need higher than high_volume for a launch? Drop us a note via the support links in the footer.
Headers on every response
Every /v1/* response carries the live request-counter values for your key. Use them to throttle client-side instead of waiting for a 429.
X-RateLimit-Tier— the tier that produced these limits.X-RateLimit-Limit-Requests— your RPM cap.X-RateLimit-Remaining-Requests— requests left in the current minute.X-RateLimit-Reset-Requests— seconds until the current minute window resets.Retry-After— only set on 429. Seconds to sleep before retrying.
curl -i https://caicaini.com/v1/messages \
-H "Authorization: Bearer cai_api_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"caicaini/auto","max_tokens":50,"messages":[{"role":"user","content":"Hi"}]}'
# X-RateLimit-Tier: pay_as_you_go
# X-RateLimit-Limit-Requests: 60
# X-RateLimit-Remaining-Requests: 59
# X-RateLimit-Reset-Requests: 42 # seconds until the window resetsWhen you exceed
Over the limit returns 429 with type rate_limit_error and a Retry-After header in seconds. The request is rejected before any provider call, so it does not consume credits.
{
"type": "error",
"error": {
"type": "rate_limit_error",
"message": "Requests per minute exceeded for this key. Retry after 12s."
}
}Recommended backoff
Wrap calls in a small retry helper. Read Retry-After when present, fall back to exponential backoff with jitter when not. Cap retries at 4–6 attempts and a total wait around 30 seconds.
# bash one-liner that retries on 429 with exponential backoff
delay=1
for i in 1 2 3 4 5; do
status=$(curl -s -o /tmp/body -w "%{http_code}" \
-X POST https://caicaini.com/v1/messages \
-H "Authorization: Bearer cai_api_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"caicaini/auto","max_tokens":50,"messages":[{"role":"user","content":"Hi"}]}')
if [ "$status" != "429" ]; then cat /tmp/body; exit 0; fi
sleep "$delay"
delay=$((delay * 2))
done
echo "rate-limited after retries" >&2
exit 1Insufficient credits is a separate signal
Out-of-credits returns 402 with type insufficient_quota. Do not retry — it will not pass until your balance goes up. Catch the 402 explicitly, alert your operator, and pause the job. See Errors for the full table of error types.