Getting started
Models
Five virtual models cover every workload Caicaini supports. Pick by capability, not by vendor: pass the id as the model field on every /v1/messages or /v1/chat/completions request.
Model ids
These are the only valid values for the model field. Anything else, including legacy ids you may have seen elsewhere, returns 400 with type invalid_request_error.
caicaini/auto
Auto (smart routing)
Smart router. Picks a model per turn based on the prompt complexity, the capabilities your request needs, and your remaining credits.
When you do not have a strong opinion. Lowest credit cost on average.
caicaini/opus
Opus
Highest-capability model. Best for hard reasoning, multi-step planning, agentic loops, and code that the model has to hold in head across many files.
Hard tasks where quality outweighs cost.
caicaini/sonnet
Sonnet
Balanced general-purpose model. Excellent at structured output, retrieval-augmented Q&A, summarization, and most agent loops.
Strong default for production traffic.
caicaini/kimi
Lite
Cost-efficient model with a 256K context window and native multimodal. Great for long-context retrieval, document Q&A, and high-throughput pipelines where price matters more than the last 5% of capability.
High volume, long documents, anything where unit economics dominate.
caicaini/haiku
Haiku
Fastest model. Tuned for short, latency-sensitive turns: classification, routing, light summarization, and inline UX features that need answers in under a second.
Latency-critical workloads.
GET /v1/models
The list endpoint returns the same five entries plus their capability flags. Use it to feature-gate calls in your client (only show the "analyze image" button if supports_vision is true on the selected model).
curl https://caicaini.com/v1/models \
-H "Authorization: Bearer cai_api_YOUR_KEY"Response shape
{
"data": [
{
"id": "caicaini/auto",
"object": "model",
"display_name": "Auto (smart routing)",
"description": "Routes intelligently to the cheapest model that handles the request well.",
"context_window": 200000,
"max_output_tokens": 8192,
"supports_vision": true,
"supports_tools": true,
"supports_thinking": true
},
{
"id": "caicaini/opus",
"object": "model",
"display_name": "Opus",
"description": "Highest-capability model. Best for complex reasoning, deep analysis, and code that benefits from deliberate thought.",
"context_window": 1000000,
"max_output_tokens": 32768,
"supports_vision": true,
"supports_tools": true,
"supports_thinking": true
},
{
"id": "caicaini/sonnet",
"object": "model",
"display_name": "Sonnet",
"description": "Balanced model. Strong reasoning at a more economical price point.",
"context_window": 1000000,
"max_output_tokens": 16384,
"supports_vision": true,
"supports_tools": true,
"supports_thinking": true
},
{
"id": "caicaini/kimi",
"object": "model",
"display_name": "Lite",
"description": "Fast, low-cost model with native multimodal support. Great default for chat and code completion.",
"context_window": 262144,
"max_output_tokens": 32768,
"supports_vision": true,
"supports_tools": true,
"supports_thinking": true
},
{
"id": "caicaini/haiku",
"object": "model",
"display_name": "Haiku",
"description": "Fastest, cheapest tier. Best for high-throughput simple completions and lightweight tooling.",
"context_window": 200000,
"max_output_tokens": 8192,
"supports_vision": true,
"supports_tools": true,
"supports_thinking": false
}
]
}Picking a model
- Start with
caicaini/autofor everything. Look at the resolved model in your usage logs after a few hundred turns and decide if you want to pin. - For long-context retrieval (more than ~200K input tokens), pin
caicaini/kimifor unit economics orcaicaini/opus/caicaini/sonnetif you need the 1M-token window. - For agent loops that need extended thinking, pin
caicaini/opusorcaicaini/sonnetand set thethinkingfield on the request. - For sub-second latency turns, pin
caicaini/haiku. Avoid it for tasks that need long-form synthesis or extended thinking.