Endpoints
POST /v1/messages
The primary endpoint. Sends a list of messages, returns the model's reply. Supports system prompts, multi-turn conversations, vision, tools, thinking, and streaming.
Basic call
Three required fields: model, max_tokens, and messages. The system field is optional but recommended for steering tone and constraints.
curl https://caicaini.com/v1/messages \
-H "Authorization: Bearer cai_api_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "caicaini/sonnet",
"max_tokens": 1024,
"system": "You are a senior backend engineer. Be terse.",
"messages": [
{"role": "user", "content": "Why prefer queues over cron for retries?"}
]
}'Request fields
| Field | Type | Description |
|---|---|---|
| modelrequired | string | One of the five virtual ids. See Models. |
| max_tokensrequired | integer | Hard cap on the number of tokens the model may generate. We will charge for what is actually used, but never more than this cap. |
| messagesrequired | array | Conversation history. Alternate user and assistant messages, ending with a user message. |
| system | string | System prompt. Appears once at the top of the context. Steers tone, output format, and constraints. |
| temperature | number 0–1 | Default 1. Lower values make the model more deterministic. |
| top_p | number 0–1 | Nucleus sampling. Prefer to set either temperature or top_p, not both. |
| stop_sequences | string[] | Up to four strings. The model stops generating when it produces any of them. |
| stream | boolean | When true, the response is an SSE stream. See Streaming. |
| tools | Tool[] | Function definitions the model may call. See Tools. |
| tool_choice | object | Force a specific tool, or set { type: "any" } to require any tool call. |
| thinking | object | Enable extended reasoning on caicaini/opus. See Thinking. |
| metadata | object | Optional { user_id?: string }. Helps with abuse investigations and per-user reporting. |
The messages array
Each message has a role ("user" or "assistant") and a content. Content can be a plain string or an array of typed content blocks. Use the array form when you need to mix images, tool results, or multiple text segments in a single turn.
curl https://caicaini.com/v1/messages \
-H "Authorization: Bearer cai_api_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "caicaini/auto",
"max_tokens": 512,
"messages": [
{"role": "user", "content": "Plan a 3-day Lisbon trip in October."},
{"role": "assistant", "content": "Sure. Day 1: Alfama walking tour..."},
{"role": "user", "content": "Skip the tram. I want food only."}
]
}'Response
response · 200 OK
{
"id": "msg_01H8fkx2N3p4q5r6s7t8u9v0wx",
"type": "message",
"role": "assistant",
"content": [
{ "type": "text", "text": "Cron pretends a job is fire-and-forget..." }
],
"model": "caicaini/sonnet",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 38,
"output_tokens": 184,
"cache_creation_input_tokens": 0,
"cache_read_input_tokens": 0,
"credits_consumed": 92
}
}Response fields
| Field | Type | Description |
|---|---|---|
| id | string | Unique id for this message. Save it for support tickets. |
| type | "message" | Always "message" for /v1/messages. |
| role | "assistant" | Always "assistant" on the response. |
| content | ContentBlock[] | Array of typed blocks. Always at least one. Iterate over them — text-only assumptions break with tools and thinking. |
| model | string | The virtual id you requested. We do not expose underlying provider names. |
| stop_reason | string | One of "end_turn", "max_tokens", "stop_sequence", "tool_use". |
| usage.input_tokens | integer | Input tokens (excluding cache reads). |
| usage.output_tokens | integer | Tokens generated, including any thinking-block tokens. |
| usage.cache_creation_input_tokens | integer | Tokens written to the prompt cache on this turn. |
| usage.cache_read_input_tokens | integer | Tokens served from the prompt cache on this turn. |
| usage.credits_consumed | integer | Authoritative credits charged for this turn. Caicaini extension. |
POST /v1/messages/count_tokens
Returns just the input token count for a request body, without running the model. Useful for budget previews and routing decisions. Free, but rate-limited the same as /v1/messages.
curl https://caicaini.com/v1/messages/count_tokens \
-H "Authorization: Bearer cai_api_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "caicaini/sonnet",
"messages": [
{"role": "user", "content": "How many tokens is this prompt?"}
]
}'Limits
- Request body up to 16 MB. Most images fit comfortably; very large vision payloads should be split or summarized client-side.
- Conversation length is bounded only by the model context window. See
context_windowon the Models response. max_tokensmay not exceed each model'smax_output_tokens.