快速入门

模型

Caicaini 用五个虚拟模型覆盖所有工作负载。按能力而不是按厂商挑选：在每次 /v1/messages 或 /v1/chat/completions 请求里把 id 作为 model 字段传入即可。

模型 ID

model 字段只接受这些值。其他任何值——包括你在别处见过的旧 ID——都会返回 type 为 invalid_request_error 的 400。

caicaini/auto

Auto（智能路由）

智能路由器。每一轮根据提示复杂度、所需能力以及你剩余的积分挑选模型。

上下文 200K最大输出 8,192vision · tools · thinking

当你没有强烈偏好时使用。平均积分成本最低。

caicaini/opus

Opus

能力最强的模型。最适合困难推理、多步规划、智能体循环以及需要在脑中跨多文件保持的代码任务。

上下文 1M最大输出 32,768vision · tools · thinking

对质量要求高于成本的难任务。

caicaini/sonnet

Sonnet

均衡的通用模型。在结构化输出、检索增强问答、摘要和大多数智能体循环上表现出色。

上下文 1M最大输出 16,384vision · tools · thinking

生产流量的优选默认。

caicaini/kimi

Lite

高性价比模型，256K 上下文窗口，原生多模态。非常适合长上下文检索、文档问答以及对单位成本敏感、对最后 5% 能力差距不敏感的高吞吐管线。

上下文 256K最大输出 32,768vision · tools · thinking

高吞吐、长文档，以及任何由单位经济性主导的场景。

caicaini/haiku

Haiku

最快的模型。专为短而对延迟敏感的轮次调优：分类、路由、轻量摘要，以及需要在一秒内响应的内嵌 UX 功能。

上下文 200K最大输出 8,192vision · tools

对延迟极敏感的工作负载。

GET /v1/models

list 端点返回同样的五条记录，并附带能力开关。可在客户端按能力做开关（仅当所选模型的 supports_vision 为 true 时显示「分析图片」按钮）。

curl https://caicaini.com/v1/models \
  -H "Authorization: Bearer cai_api_YOUR_KEY"

响应形态

响应 · 200 OK

{
  "data": [
    {
      "id": "caicaini/auto",
      "object": "model",
      "display_name": "Auto (smart routing)",
      "description": "Routes intelligently to the cheapest model that handles the request well.",
      "context_window": 200000,
      "max_output_tokens": 8192,
      "supports_vision": true,
      "supports_tools": true,
      "supports_thinking": true
    },
    {
      "id": "caicaini/opus",
      "object": "model",
      "display_name": "Opus",
      "description": "Highest-capability model. Best for complex reasoning, deep analysis, and code that benefits from deliberate thought.",
      "context_window": 1000000,
      "max_output_tokens": 32768,
      "supports_vision": true,
      "supports_tools": true,
      "supports_thinking": true
    },
    {
      "id": "caicaini/sonnet",
      "object": "model",
      "display_name": "Sonnet",
      "description": "Balanced model. Strong reasoning at a more economical price point.",
      "context_window": 1000000,
      "max_output_tokens": 16384,
      "supports_vision": true,
      "supports_tools": true,
      "supports_thinking": true
    },
    {
      "id": "caicaini/kimi",
      "object": "model",
      "display_name": "Lite",
      "description": "Fast, low-cost model with native multimodal support. Great default for chat and code completion.",
      "context_window": 262144,
      "max_output_tokens": 32768,
      "supports_vision": true,
      "supports_tools": true,
      "supports_thinking": true
    },
    {
      "id": "caicaini/haiku",
      "object": "model",
      "display_name": "Haiku",
      "description": "Fastest, cheapest tier. Best for high-throughput simple completions and lightweight tooling.",
      "context_window": 200000,
      "max_output_tokens": 8192,
      "supports_vision": true,
      "supports_tools": true,
      "supports_thinking": false
    }
  ]
}

如何挑选模型

一律先用 caicaini/auto。跑过几百轮之后再看用量日志中的实际命中模型，决定要不要固定。
对于长上下文检索（输入超过约 200K token）：要单位经济性就固定 caicaini/kimi，需要 1M token 窗口就固定 caicaini/opus 或 caicaini/sonnet。
需要扩展思考的智能体循环，请固定 caicaini/opus 或 caicaini/sonnet，并在请求里设置 thinking 字段。
对亚秒级延迟轮次，请固定 caicaini/haiku。需要长篇综合或扩展思考的任务请避开它。

上一篇身份验证

下一篇Messages

模型

模型 ID#

GET /v1/models#

响应形态#

如何挑选模型#

模型 ID

GET /v1/models

响应形态

如何挑选模型