Caicaini
Get started

Getting started

Models

Five virtual models cover every workload Caicaini supports. Pick by capability, not by vendor: pass the id as the model field on every /v1/messages or /v1/chat/completions request.

Model ids

These are the only valid values for the model field. Anything else, including legacy ids you may have seen elsewhere, returns 400 with type invalid_request_error.

caicaini/auto

Auto (smart routing)

Smart router. Picks a model per turn based on the prompt complexity, the capabilities your request needs, and your remaining credits.

Context 200KMax output 8,192vision · tools · thinking

When you do not have a strong opinion. Lowest credit cost on average.

caicaini/opus

Opus

Highest-capability model. Best for hard reasoning, multi-step planning, agentic loops, and code that the model has to hold in head across many files.

Context 1MMax output 32,768vision · tools · thinking

Hard tasks where quality outweighs cost.

caicaini/sonnet

Sonnet

Balanced general-purpose model. Excellent at structured output, retrieval-augmented Q&A, summarization, and most agent loops.

Context 1MMax output 16,384vision · tools · thinking

Strong default for production traffic.

caicaini/kimi

Lite

Cost-efficient model with a 256K context window and native multimodal. Great for long-context retrieval, document Q&A, and high-throughput pipelines where price matters more than the last 5% of capability.

Context 256KMax output 32,768vision · tools · thinking

High volume, long documents, anything where unit economics dominate.

caicaini/haiku

Haiku

Fastest model. Tuned for short, latency-sensitive turns: classification, routing, light summarization, and inline UX features that need answers in under a second.

Context 200KMax output 8,192vision · tools

Latency-critical workloads.

GET /v1/models

The list endpoint returns the same five entries plus their capability flags. Use it to feature-gate calls in your client (only show the "analyze image" button if supports_vision is true on the selected model).

curl https://caicaini.com/v1/models \
  -H "Authorization: Bearer cai_api_YOUR_KEY"

Response shape

response · 200 OK
{
  "data": [
    {
      "id": "caicaini/auto",
      "object": "model",
      "display_name": "Auto (smart routing)",
      "description": "Routes intelligently to the cheapest model that handles the request well.",
      "context_window": 200000,
      "max_output_tokens": 8192,
      "supports_vision": true,
      "supports_tools": true,
      "supports_thinking": true
    },
    {
      "id": "caicaini/opus",
      "object": "model",
      "display_name": "Opus",
      "description": "Highest-capability model. Best for complex reasoning, deep analysis, and code that benefits from deliberate thought.",
      "context_window": 1000000,
      "max_output_tokens": 32768,
      "supports_vision": true,
      "supports_tools": true,
      "supports_thinking": true
    },
    {
      "id": "caicaini/sonnet",
      "object": "model",
      "display_name": "Sonnet",
      "description": "Balanced model. Strong reasoning at a more economical price point.",
      "context_window": 1000000,
      "max_output_tokens": 16384,
      "supports_vision": true,
      "supports_tools": true,
      "supports_thinking": true
    },
    {
      "id": "caicaini/kimi",
      "object": "model",
      "display_name": "Lite",
      "description": "Fast, low-cost model with native multimodal support. Great default for chat and code completion.",
      "context_window": 262144,
      "max_output_tokens": 32768,
      "supports_vision": true,
      "supports_tools": true,
      "supports_thinking": true
    },
    {
      "id": "caicaini/haiku",
      "object": "model",
      "display_name": "Haiku",
      "description": "Fastest, cheapest tier. Best for high-throughput simple completions and lightweight tooling.",
      "context_window": 200000,
      "max_output_tokens": 8192,
      "supports_vision": true,
      "supports_tools": true,
      "supports_thinking": false
    }
  ]
}

Picking a model

  • Start with caicaini/auto for everything. Look at the resolved model in your usage logs after a few hundred turns and decide if you want to pin.
  • For long-context retrieval (more than ~200K input tokens), pin caicaini/kimi for unit economics or caicaini/opus / caicaini/sonnet if you need the 1M-token window.
  • For agent loops that need extended thinking, pin caicaini/opus or caicaini/sonnet and set the thinking field on the request.
  • For sub-second latency turns, pin caicaini/haiku. Avoid it for tasks that need long-form synthesis or extended thinking.