Caicaini
免费开始

能力

流式响应

在任意 /v1/messages 或 /v1/chat/completions 请求上设置 stream: true,我们会通过 Server-Sent Events 把 token 发回。认证、模型选择和积分计算完全一致。

基本流式

加上 stream: true,并以 text/event-stream 读取响应体。多数 HTTP 库无需额外依赖即可处理。下面的例子使用 curl 的 -N 关闭缓冲,让你能实时看到 token 落地。

curl -N https://caicaini.com/v1/messages \
  -H "Authorization: Bearer cai_api_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "caicaini/auto",
    "max_tokens": 200,
    "stream": true,
    "messages": [{"role":"user","content":"Count to 10 slowly."}]
  }'

/v1/messages 上的事件

messages 端点会发出一系列具名 SSE 事件。每一帧包含一行 event: 和一行 JSON 形式的 data:。空行用于分隔帧。

事件含义
message_start首帧。携带 id 和输入 token 数。
content_block_start新的内容块开始。类型可能是 texttool_usethinking
content_block_delta当前块的 token 增量。delta 的形态与块类型对应。
content_block_stop当前块结束。
message_delta携带最终的 stop_reason 和权威 credits_consumed
message_stop末帧。停止读取。
ping约每 15 秒一次的保活心跳,忽略即可。
error流中途出现错误。连接可能短暂保持开启,按终止处理即可。

样例输出

原始流
event: message_start
data: {"type":"message_start","message":{"id":"msg_01H...","type":"message","role":"assistant","content":[],"model":"caicaini/auto","stop_reason":null,"usage":{"input_tokens":12,"output_tokens":0}}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"One, "}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"two, three..."}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"input_tokens":12,"output_tokens":24,"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"credits_consumed":18}}

event: message_stop
data: {"type":"message_stop"}

/v1/chat/completions 上的事件

chat-completions 端点流式发送的是匿名事件:每一帧只是一行 data:,载荷为 chat.completion.chunk 对象,并以字面 data: [DONE] 终止。

原始流
data: {"id":"chatcmpl_01H...","object":"chat.completion.chunk","created":1746748800,"model":"caicaini/auto","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl_01H...","object":"chat.completion.chunk","created":1746748800,"model":"caicaini/auto","choices":[{"index":0,"delta":{"content":"One, two, "},"finish_reason":null}]}

data: {"id":"chatcmpl_01H...","object":"chat.completion.chunk","created":1746748800,"model":"caicaini/auto","choices":[{"index":0,"delta":{"content":"three..."},"finish_reason":null}]}

data: {"id":"chatcmpl_01H...","object":"chat.completion.chunk","created":1746748800,"model":"caicaini/auto","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":24,"total_tokens":36,"credits_consumed":18}}

data: [DONE]

需要注意的事项

  • 始终按空行分隔来解析帧,而不是按单行 data:。多行载荷在 SSE 中是合法的。
  • 本轮的 credits_consumedmessage_delta(messages)或 [DONE] 之前的最后一个块(chat completions)才是终值,更早的都是部分值。
  • 如果流在中途断开,请求仍会按已生成的 token 计费。如果断开发生在任何输出送达之前,我们会自动退款。
  • 每把密钥的并发流数受档位限制。详见 速率限制