能力

深度思考

扩展思考给模型一块私有草稿纸，让它先思考再回答。模型把推理写出来，由你决定是否展示给用户，而它给出的最终答案在难题上往往会好得多。

什么时候用思考

多步数学题、逻辑谜题、规划类问题。
需要在脑子里同时放下许多约束的代码：架构改动、调试、性能优化。
长链路智能体循环，一个错误工具调用会引发更多错误调用。
你目前需要手动串「先思考再回答」提示的所有场景。

启用思考

在请求里加上 thinking 字段。budget_tokens 是私有推理 token 的上限；请把它设小于 max_tokens，让模型还有空间写出可见答案。

curl https://caicaini.com/v1/messages \
  -H "Authorization: Bearer cai_api_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "caicaini/opus",
    "max_tokens": 4096,
    "thinking": { "type": "enabled", "budget_tokens": 2048 },
    "messages": [
      {"role":"user","content":"A jug has 12L. Two pours: 5L jug and 7L jug. Show me how to measure exactly 6L."}
    ]
  }'

响应形态

响应是普通的 Messages 回复，content 数组开头会有一个或多个 thinking 块，紧随其后的是公开的 text 块。

响应 · 200 OK

{
  "id": "msg_01H...",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me work this out step by step. I have 5L and 7L jugs, want exactly 6L. Fill the 7L jug, pour into 5L jug until full — leaves 2L. Empty 5L. Pour the 2L into the 5L. Fill the 7L again, pour into 5L until full (3L moves) — 4L left. Hmm not quite. Let me reconsider..."
    },
    {
      "type": "text",
      "text": "Here is one path: 1) Fill the 7L. 2) Pour into the 5L until full — 2L remains in the 7L. 3) Empty the 5L. 4) Move the 2L into the 5L. 5) Fill the 7L again. 6) Pour from the 7L into the 5L until full — that uses 3L, leaving 4L in the 7L. 7) Empty the 5L. 8) Move the 4L to the 5L. 9) Fill the 7L again. 10) Pour into the 5L until full — uses 1L, leaves exactly 6L in the 7L."
    }
  ],
  "model": "caicaini/opus",
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 47,
    "output_tokens": 1830,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0,
    "credits_consumed": 612
  }
}

成本

思考 token 按输出 token 计费，单价与模型其他输出一致。它们会计入响应 usage 中的 output_tokens。
usage.credits_consumed 为权威值，已包含思考的成本。
请把 budget_tokens 设保守一点。多数任务 1024–2048 已绰绰有余；只在观察到模型推理被截断时再调高。

要不要把思考展示给用户？

通常不要。推理过程是私有草稿——里面可能有推测分支、死胡同或自我纠错，对非技术用户会造成困惑。默认只展示最终的 text 块。如果你想做「显示推理过程」的开关，请明确标注，并以可折叠区域呈现思考内容。

思考 + 流式

在启用思考的请求上做流式时，SSE 流会先发出一个 type 为 thinking 的 content_block_start，紧接着是 delta.type 为 thinking_delta 的若干 content_block_delta 帧，再发出一个 content_block_stop，然后才开始第一个 text 块。完整事件词汇请见流式响应。

思考与工具组合

工具调用和思考可以叠加。模型先思考、决定调用哪个工具、产出 tool_use 块；你回传结果，下一轮可以再思考。这是难任务下质量最高的智能体循环配置。完整循环见工具。

上一篇工具

下一篇速率限制

深度思考

什么时候用思考#

启用思考#

响应形态#

成本#

要不要把思考展示给用户？#

思考 + 流式#