Capabilities
Streaming
Set stream: true on any /v1/messages or /v1/chat/completions request and we send tokens back over Server-Sent Events. Same auth, same model selection, same credit math.
Basic stream
Add stream: true and read the response body as text/event-stream. Most HTTP libraries can do this without extra dependencies. The example below uses curl with -N to disable buffering so you see tokens land in real time.
curl -N https://caicaini.com/v1/messages \
-H "Authorization: Bearer cai_api_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "caicaini/auto",
"max_tokens": 200,
"stream": true,
"messages": [{"role":"user","content":"Count to 10 slowly."}]
}'Events on /v1/messages
The messages endpoint emits a sequence of named SSE events. Each frame has an event: line and a data: line containing JSON. A blank line separates frames.
| Event | Meaning |
|---|---|
| message_start | First frame. Carries the id and the input-token count. |
| content_block_start | A new content block has begun. Type may be text, tool_use, or thinking. |
| content_block_delta | Token chunk for the current block. The shape of delta matches the block type. |
| content_block_stop | The current block is complete. |
| message_delta | Carries the final stop_reason and the authoritative credits_consumed. |
| message_stop | Last frame. Stop reading. |
| ping | Keepalive every ~15 s. Ignore. |
| error | An error occurred mid-stream. Connection may stay open briefly. Treat as terminal. |
Sample wire output
raw stream
event: message_start
data: {"type":"message_start","message":{"id":"msg_01H...","type":"message","role":"assistant","content":[],"model":"caicaini/auto","stop_reason":null,"usage":{"input_tokens":12,"output_tokens":0}}}
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"One, "}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"two, three..."}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"input_tokens":12,"output_tokens":24,"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"credits_consumed":18}}
event: message_stop
data: {"type":"message_stop"}Events on /v1/chat/completions
The chat-completions endpoint streams unnamed events: every frame is a single data: line containing a chat.completion.chunk object, and the stream terminates with the literal data: [DONE].
raw stream
data: {"id":"chatcmpl_01H...","object":"chat.completion.chunk","created":1746748800,"model":"caicaini/auto","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"id":"chatcmpl_01H...","object":"chat.completion.chunk","created":1746748800,"model":"caicaini/auto","choices":[{"index":0,"delta":{"content":"One, two, "},"finish_reason":null}]}
data: {"id":"chatcmpl_01H...","object":"chat.completion.chunk","created":1746748800,"model":"caicaini/auto","choices":[{"index":0,"delta":{"content":"three..."},"finish_reason":null}]}
data: {"id":"chatcmpl_01H...","object":"chat.completion.chunk","created":1746748800,"model":"caicaini/auto","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":24,"total_tokens":36,"credits_consumed":18}}
data: [DONE]Things to keep in mind
- Always parse frames separated by a blank line, not by individual
data:lines. Multi-line payloads are valid SSE. - The
credits_consumedfor the turn is final onmessage_delta(messages) or the last chunk before[DONE](chat completions). Anything earlier is partial. - If a stream is dropped mid-flight, the request still cost credits for the tokens that were generated. We will refund automatically if the drop happened before any output was delivered.
- Concurrent streams per key are capped by your tier. See Rate limits.