Capabilities

Streaming

Set stream: true on any /v1/messages or /v1/chat/completions request and we send tokens back over Server-Sent Events. Same auth, same model selection, same credit math.

Basic stream

Add stream: true and read the response body as text/event-stream. Most HTTP libraries can do this without extra dependencies. The example below uses curl with -N to disable buffering so you see tokens land in real time.

curl -N https://caicaini.com/v1/messages \
  -H "Authorization: Bearer cai_api_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "caicaini/auto",
    "max_tokens": 200,
    "stream": true,
    "messages": [{"role":"user","content":"Count to 10 slowly."}]
  }'

Events on /v1/messages

The messages endpoint emits a sequence of named SSE events. Each frame has an event: line and a data: line containing JSON. A blank line separates frames.

Event	Meaning
message_start	First frame. Carries the `id` and the input-token count.
content_block_start	A new content block has begun. Type may be `text`, `tool_use`, or `thinking`.
content_block_delta	Token chunk for the current block. The shape of `delta` matches the block type.
content_block_stop	The current block is complete.
message_delta	Carries the final `stop_reason` and the authoritative `credits_consumed`.
message_stop	Last frame. Stop reading.
ping	Keepalive every ~15 s. Ignore.
error	An error occurred mid-stream. Connection may stay open briefly. Treat as terminal.

Sample wire output

raw stream

event: message_start
data: {"type":"message_start","message":{"id":"msg_01H...","type":"message","role":"assistant","content":[],"model":"caicaini/auto","stop_reason":null,"usage":{"input_tokens":12,"output_tokens":0}}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"One, "}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"two, three..."}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"input_tokens":12,"output_tokens":24,"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"credits_consumed":18}}

event: message_stop
data: {"type":"message_stop"}

Events on /v1/chat/completions

The chat-completions endpoint streams unnamed events: every frame is a single data: line containing a chat.completion.chunk object, and the stream terminates with the literal data: [DONE].

raw stream

data: {"id":"chatcmpl_01H...","object":"chat.completion.chunk","created":1746748800,"model":"caicaini/auto","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl_01H...","object":"chat.completion.chunk","created":1746748800,"model":"caicaini/auto","choices":[{"index":0,"delta":{"content":"One, two, "},"finish_reason":null}]}

data: {"id":"chatcmpl_01H...","object":"chat.completion.chunk","created":1746748800,"model":"caicaini/auto","choices":[{"index":0,"delta":{"content":"three..."},"finish_reason":null}]}

data: {"id":"chatcmpl_01H...","object":"chat.completion.chunk","created":1746748800,"model":"caicaini/auto","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":24,"total_tokens":36,"credits_consumed":18}}

data: [DONE]

Things to keep in mind

Always parse frames separated by a blank line, not by individual data: lines. Multi-line payloads are valid SSE.
The credits_consumed for the turn is final on message_delta (messages) or the last chunk before [DONE] (chat completions). Anything earlier is partial.
If a stream is dropped mid-flight, the request still cost credits for the tokens that were generated. We will refund automatically if the drop happened before any output was delivered.
Concurrent streams per key are capped by your tier. See Rate limits.

PreviousChat completions

NextVision

Streaming

Basic stream#

Events on /v1/messages#

Sample wire output#

Events on /v1/chat/completions#

Things to keep in mind#

Basic stream

Events on /v1/messages

Sample wire output

Events on /v1/chat/completions

Things to keep in mind