Caicaini
Get started

Endpoints

POST /v1/messages

The primary endpoint. Sends a list of messages, returns the model's reply. Supports system prompts, multi-turn conversations, vision, tools, thinking, and streaming.

Basic call

Three required fields: model, max_tokens, and messages. The system field is optional but recommended for steering tone and constraints.

curl https://caicaini.com/v1/messages \
  -H "Authorization: Bearer cai_api_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "caicaini/sonnet",
    "max_tokens": 1024,
    "system": "You are a senior backend engineer. Be terse.",
    "messages": [
      {"role": "user", "content": "Why prefer queues over cron for retries?"}
    ]
  }'

Request fields

FieldTypeDescription
modelrequiredstringOne of the five virtual ids. See Models.
max_tokensrequiredintegerHard cap on the number of tokens the model may generate. We will charge for what is actually used, but never more than this cap.
messagesrequiredarrayConversation history. Alternate user and assistant messages, ending with a user message.
systemstringSystem prompt. Appears once at the top of the context. Steers tone, output format, and constraints.
temperaturenumber 0–1Default 1. Lower values make the model more deterministic.
top_pnumber 0–1Nucleus sampling. Prefer to set either temperature or top_p, not both.
stop_sequencesstring[]Up to four strings. The model stops generating when it produces any of them.
streambooleanWhen true, the response is an SSE stream. See Streaming.
toolsTool[]Function definitions the model may call. See Tools.
tool_choiceobjectForce a specific tool, or set { type: "any" } to require any tool call.
thinkingobjectEnable extended reasoning on caicaini/opus. See Thinking.
metadataobjectOptional { user_id?: string }. Helps with abuse investigations and per-user reporting.

The messages array

Each message has a role ("user" or "assistant") and a content. Content can be a plain string or an array of typed content blocks. Use the array form when you need to mix images, tool results, or multiple text segments in a single turn.

curl https://caicaini.com/v1/messages \
  -H "Authorization: Bearer cai_api_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "caicaini/auto",
    "max_tokens": 512,
    "messages": [
      {"role": "user",      "content": "Plan a 3-day Lisbon trip in October."},
      {"role": "assistant", "content": "Sure. Day 1: Alfama walking tour..."},
      {"role": "user",      "content": "Skip the tram. I want food only."}
    ]
  }'

Response

response · 200 OK
{
  "id": "msg_01H8fkx2N3p4q5r6s7t8u9v0wx",
  "type": "message",
  "role": "assistant",
  "content": [
    { "type": "text", "text": "Cron pretends a job is fire-and-forget..." }
  ],
  "model": "caicaini/sonnet",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 38,
    "output_tokens": 184,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0,
    "credits_consumed": 92
  }
}

Response fields

FieldTypeDescription
idstringUnique id for this message. Save it for support tickets.
type"message"Always "message" for /v1/messages.
role"assistant"Always "assistant" on the response.
contentContentBlock[]Array of typed blocks. Always at least one. Iterate over them — text-only assumptions break with tools and thinking.
modelstringThe virtual id you requested. We do not expose underlying provider names.
stop_reasonstringOne of "end_turn", "max_tokens", "stop_sequence", "tool_use".
usage.input_tokensintegerInput tokens (excluding cache reads).
usage.output_tokensintegerTokens generated, including any thinking-block tokens.
usage.cache_creation_input_tokensintegerTokens written to the prompt cache on this turn.
usage.cache_read_input_tokensintegerTokens served from the prompt cache on this turn.
usage.credits_consumedintegerAuthoritative credits charged for this turn. Caicaini extension.

POST /v1/messages/count_tokens

Returns just the input token count for a request body, without running the model. Useful for budget previews and routing decisions. Free, but rate-limited the same as /v1/messages.

curl https://caicaini.com/v1/messages/count_tokens \
  -H "Authorization: Bearer cai_api_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "caicaini/sonnet",
    "messages": [
      {"role": "user", "content": "How many tokens is this prompt?"}
    ]
  }'

Limits

  • Request body up to 16 MB. Most images fit comfortably; very large vision payloads should be split or summarized client-side.
  • Conversation length is bounded only by the model context window. See context_window on the Models response.
  • max_tokens may not exceed each model's max_output_tokens.