Endpoints

POST /v1/messages

The primary endpoint. Sends a list of messages, returns the model's reply. Supports system prompts, multi-turn conversations, vision, tools, thinking, and streaming.

Basic call

Three required fields: model, max_tokens, and messages. The system field is optional but recommended for steering tone and constraints.

curl https://caicaini.com/v1/messages \
  -H "Authorization: Bearer cai_api_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "caicaini/sonnet",
    "max_tokens": 1024,
    "system": "You are a senior backend engineer. Be terse.",
    "messages": [
      {"role": "user", "content": "Why prefer queues over cron for retries?"}
    ]
  }'

Request fields

Field	Type	Description
modelrequired	string	One of the five virtual ids. See Models.
max_tokensrequired	integer	Hard cap on the number of tokens the model may generate. We will charge for what is actually used, but never more than this cap.
messagesrequired	array	Conversation history. Alternate user and assistant messages, ending with a user message.
system	string	System prompt. Appears once at the top of the context. Steers tone, output format, and constraints.
temperature	number 0–1	Default 1. Lower values make the model more deterministic.
top_p	number 0–1	Nucleus sampling. Prefer to set either temperature or top_p, not both.
stop_sequences	string[]	Up to four strings. The model stops generating when it produces any of them.
stream	boolean	When true, the response is an SSE stream. See Streaming.
tools	Tool[]	Function definitions the model may call. See Tools.
tool_choice	object	Force a specific tool, or set { type: "any" } to require any tool call.
thinking	object	Enable extended reasoning on `caicaini/opus`. See Thinking.
metadata	object	Optional { user_id?: string }. Helps with abuse investigations and per-user reporting.

Each message has a role ("user" or "assistant") and a content. Content can be a plain string or an array of typed content blocks. Use the array form when you need to mix images, tool results, or multiple text segments in a single turn.

curl https://caicaini.com/v1/messages \
  -H "Authorization: Bearer cai_api_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "caicaini/auto",
    "max_tokens": 512,
    "messages": [
      {"role": "user",      "content": "Plan a 3-day Lisbon trip in October."},
      {"role": "assistant", "content": "Sure. Day 1: Alfama walking tour..."},
      {"role": "user",      "content": "Skip the tram. I want food only."}
    ]
  }'

Response

response · 200 OK

{
  "id": "msg_01H8fkx2N3p4q5r6s7t8u9v0wx",
  "type": "message",
  "role": "assistant",
  "content": [
    { "type": "text", "text": "Cron pretends a job is fire-and-forget..." }
  ],
  "model": "caicaini/sonnet",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 38,
    "output_tokens": 184,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0,
    "credits_consumed": 92
  }
}

Response fields

Field	Type	Description
id	string	Unique id for this message. Save it for support tickets.
type	"message"	Always "message" for /v1/messages.
role	"assistant"	Always "assistant" on the response.
content	ContentBlock[]	Array of typed blocks. Always at least one. Iterate over them — text-only assumptions break with tools and thinking.
model	string	The virtual id you requested. We do not expose underlying provider names.
stop_reason	string	One of "end_turn", "max_tokens", "stop_sequence", "tool_use".
usage.input_tokens	integer	Input tokens (excluding cache reads).
usage.output_tokens	integer	Tokens generated, including any thinking-block tokens.
usage.cache_creation_input_tokens	integer	Tokens written to the prompt cache on this turn.
usage.cache_read_input_tokens	integer	Tokens served from the prompt cache on this turn.
usage.credits_consumed	integer	Authoritative credits charged for this turn. Caicaini extension.

POST /v1/messages/count_tokens

Returns just the input token count for a request body, without running the model. Useful for budget previews and routing decisions. Free, but rate-limited the same as /v1/messages.

curl https://caicaini.com/v1/messages/count_tokens \
  -H "Authorization: Bearer cai_api_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "caicaini/sonnet",
    "messages": [
      {"role": "user", "content": "How many tokens is this prompt?"}
    ]
  }'

Limits

Request body up to 16 MB. Most images fit comfortably; very large vision payloads should be split or summarized client-side.
Conversation length is bounded only by the model context window. See context_window on the Models response.
max_tokens may not exceed each model's max_output_tokens.

PreviousModels

NextChat completions

POST /v1/messages

Basic call#

Request fields#

The messages array#

Response#

Response fields#

POST /v1/messages/count_tokens#