Caicaini
Get started

Capabilities

Streaming

Set stream: true on any /v1/messages or /v1/chat/completions request and we send tokens back over Server-Sent Events. Same auth, same model selection, same credit math.

Basic stream

Add stream: true and read the response body as text/event-stream. Most HTTP libraries can do this without extra dependencies. The example below uses curl with -N to disable buffering so you see tokens land in real time.

curl -N https://caicaini.com/v1/messages \
  -H "Authorization: Bearer cai_api_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "caicaini/auto",
    "max_tokens": 200,
    "stream": true,
    "messages": [{"role":"user","content":"Count to 10 slowly."}]
  }'

Events on /v1/messages

The messages endpoint emits a sequence of named SSE events. Each frame has an event: line and a data: line containing JSON. A blank line separates frames.

EventMeaning
message_startFirst frame. Carries the id and the input-token count.
content_block_startA new content block has begun. Type may be text, tool_use, or thinking.
content_block_deltaToken chunk for the current block. The shape of delta matches the block type.
content_block_stopThe current block is complete.
message_deltaCarries the final stop_reason and the authoritative credits_consumed.
message_stopLast frame. Stop reading.
pingKeepalive every ~15 s. Ignore.
errorAn error occurred mid-stream. Connection may stay open briefly. Treat as terminal.

Sample wire output

raw stream
event: message_start
data: {"type":"message_start","message":{"id":"msg_01H...","type":"message","role":"assistant","content":[],"model":"caicaini/auto","stop_reason":null,"usage":{"input_tokens":12,"output_tokens":0}}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"One, "}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"two, three..."}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"input_tokens":12,"output_tokens":24,"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"credits_consumed":18}}

event: message_stop
data: {"type":"message_stop"}

Events on /v1/chat/completions

The chat-completions endpoint streams unnamed events: every frame is a single data: line containing a chat.completion.chunk object, and the stream terminates with the literal data: [DONE].

raw stream
data: {"id":"chatcmpl_01H...","object":"chat.completion.chunk","created":1746748800,"model":"caicaini/auto","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl_01H...","object":"chat.completion.chunk","created":1746748800,"model":"caicaini/auto","choices":[{"index":0,"delta":{"content":"One, two, "},"finish_reason":null}]}

data: {"id":"chatcmpl_01H...","object":"chat.completion.chunk","created":1746748800,"model":"caicaini/auto","choices":[{"index":0,"delta":{"content":"three..."},"finish_reason":null}]}

data: {"id":"chatcmpl_01H...","object":"chat.completion.chunk","created":1746748800,"model":"caicaini/auto","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":24,"total_tokens":36,"credits_consumed":18}}

data: [DONE]

Things to keep in mind

  • Always parse frames separated by a blank line, not by individual data: lines. Multi-line payloads are valid SSE.
  • The credits_consumed for the turn is final on message_delta (messages) or the last chunk before [DONE] (chat completions). Anything earlier is partial.
  • If a stream is dropped mid-flight, the request still cost credits for the tokens that were generated. We will refund automatically if the drop happened before any output was delivered.
  • Concurrent streams per key are capped by your tier. See Rate limits.