Reference
Errors
All non-2xx responses share one envelope. The HTTP status and the error.type field together tell you what to do — retry, back off, fix the input, or stop.
Error envelope
Every error response is JSON with the same shape. The type at the top is always the literal string "error"; the meaningful part is error.type.
{
"type": "error",
"error": {
"type": "invalid_request_error",
"message": "max_tokens must be a positive integer."
}
}Status codes
| HTTP | type | When you see it |
|---|---|---|
| 400 | invalid_request_error | The request body is malformed or violates a constraint (unknown model, max_tokens too high, mis-shaped messages array). Do not retry — fix the input. |
| 401 | authentication_error | Missing, malformed, or invalid API key. Do not retry as-is — re-authenticate. |
| 402 | insufficient_quota | Your apiCredits balance is below what this turn requires. Hard stop. Top up at /developers/billing. |
| 403 | permission_error | The key authenticated, but is not allowed for this surface (CLI key on /v1/*, suspended account, scope mismatch). |
| 404 | not_found_error | The path or resource does not exist. Check the URL and the model id. |
| 413 | request_too_large | The body exceeded the 16 MB request cap. Split the payload or downscale images. |
| 429 | rate_limit_error | Per-key rate limit hit. Honor Retry-After. Does not consume credits. |
| 500 | api_error | Internal failure on our side. Retry with backoff. Idempotent. |
| 502 | api_error | Upstream provider returned an error. Retry with backoff. |
| 503 | api_error | Service temporarily unavailable (deploy, maintenance). Retry with backoff. |
| 504 | api_error | Provider timed out. Retry with backoff. Consider lowering max_tokens. |
| 529 | overloaded_error | Capacity-constrained on the upstream model. Retry with backoff. Often clears in a few seconds. |
How to handle each class
- Retry-safe: 429, 500, 502, 503, 504, 529. Use exponential backoff with jitter. Honor
Retry-Afterwhen present. Cap at 4–6 attempts. - Hard stop: 402 (
insufficient_quota) and 401/403 (authentication_error,permission_error). No amount of retries will help. Surface to a human or alerting channel. - Fix-the-input: 400 (
invalid_request_error) and 413. Validate before send next time; bug fix on the caller side.
Message ids
Every successful response carries an id field (e.g. msg_01H8fkx2N3p4q5r6s7t8u9v0wx on /v1/messages, chatcmpl_... on /v1/chat/completions). Save the id on every call you make. When you open a support ticket the id is the fastest way for us to trace the full lifecycle of the request: which provider was selected, what was reserved, and what was actually billed.
A small reusable handler
Wrap every call site in a function that classifies the response into one of the three buckets above. The example below distinguishes insufficient_quota (don't retry) from rate_limit_error (retry) from transient provider errors (retry). The caller decides retry strategy.
# Inspect the type field to decide what to do.
status=$(curl -s -o /tmp/body -w "%{http_code}" \
-X POST https://caicaini.com/v1/messages \
-H "Authorization: Bearer cai_api_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"caicaini/auto","max_tokens":50,"messages":[{"role":"user","content":"Hi"}]}')
errtype=$(jq -r '.error.type // empty' /tmp/body 2>/dev/null)
echo "status=$status type=$errtype"
case "$status$errtype" in
200*) echo "ok" ;;
402*|*insufficient_quota) echo "top up at /developers/billing"; exit 2 ;;
429*|*rate_limit_error) echo "throttled — sleep, then retry" ;;
500*|502*|503*|504*|529*) echo "transient — retry with backoff" ;;
*) echo "fatal — fix the request"; exit 1 ;;
esacRelated
- Rate limits — full backoff strategy and tier table.
- Authentication — what a 401 actually means.
- Pricing & credits — what a 402 looks like in practice.