能力
流式响应
在任意 /v1/messages 或 /v1/chat/completions 请求上设置 stream: true,我们会通过 Server-Sent Events 把 token 发回。认证、模型选择和积分计算完全一致。
基本流式
加上 stream: true,并以 text/event-stream 读取响应体。多数 HTTP 库无需额外依赖即可处理。下面的例子使用 curl 的 -N 关闭缓冲,让你能实时看到 token 落地。
curl -N https://caicaini.com/v1/messages \
-H "Authorization: Bearer cai_api_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "caicaini/auto",
"max_tokens": 200,
"stream": true,
"messages": [{"role":"user","content":"Count to 10 slowly."}]
}'/v1/messages 上的事件
messages 端点会发出一系列具名 SSE 事件。每一帧包含一行 event: 和一行 JSON 形式的 data:。空行用于分隔帧。
| 事件 | 含义 |
|---|---|
| message_start | 首帧。携带 id 和输入 token 数。 |
| content_block_start | 新的内容块开始。类型可能是 text、tool_use 或 thinking。 |
| content_block_delta | 当前块的 token 增量。delta 的形态与块类型对应。 |
| content_block_stop | 当前块结束。 |
| message_delta | 携带最终的 stop_reason 和权威 credits_consumed。 |
| message_stop | 末帧。停止读取。 |
| ping | 约每 15 秒一次的保活心跳,忽略即可。 |
| error | 流中途出现错误。连接可能短暂保持开启,按终止处理即可。 |
样例输出
原始流
event: message_start
data: {"type":"message_start","message":{"id":"msg_01H...","type":"message","role":"assistant","content":[],"model":"caicaini/auto","stop_reason":null,"usage":{"input_tokens":12,"output_tokens":0}}}
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"One, "}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"two, three..."}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"input_tokens":12,"output_tokens":24,"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"credits_consumed":18}}
event: message_stop
data: {"type":"message_stop"}/v1/chat/completions 上的事件
chat-completions 端点流式发送的是匿名事件:每一帧只是一行 data:,载荷为 chat.completion.chunk 对象,并以字面 data: [DONE] 终止。
原始流
data: {"id":"chatcmpl_01H...","object":"chat.completion.chunk","created":1746748800,"model":"caicaini/auto","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"id":"chatcmpl_01H...","object":"chat.completion.chunk","created":1746748800,"model":"caicaini/auto","choices":[{"index":0,"delta":{"content":"One, two, "},"finish_reason":null}]}
data: {"id":"chatcmpl_01H...","object":"chat.completion.chunk","created":1746748800,"model":"caicaini/auto","choices":[{"index":0,"delta":{"content":"three..."},"finish_reason":null}]}
data: {"id":"chatcmpl_01H...","object":"chat.completion.chunk","created":1746748800,"model":"caicaini/auto","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":24,"total_tokens":36,"credits_consumed":18}}
data: [DONE]需要注意的事项
- 始终按空行分隔来解析帧,而不是按单行
data:。多行载荷在 SSE 中是合法的。 - 本轮的
credits_consumed在message_delta(messages)或[DONE]之前的最后一个块(chat completions)才是终值,更早的都是部分值。 - 如果流在中途断开,请求仍会按已生成的 token 计费。如果断开发生在任何输出送达之前,我们会自动退款。
- 每把密钥的并发流数受档位限制。详见 速率限制。