Current instruments

Compare live Text, Code, and Agent instruments by quality floor, context, output, and latency.

At-a-glance comparison

Instrument
Quality Score
Context
Output (max)
TTFT (max)
Status

Text Prime (text-prime)

≥ 128K

128K

4.62s

Live

Text Max (text-max)

1M

128K

3.50s

Live

Text Standard (text-standard)

≥ 128K

128K

1.32s

Live

Code Prime (code-prime)

≥ 128K

128K

4.0s

Preview

Code Max (code-max)

1M

128K

5.0s

Preview

Code Standard (code-standard)

≥ 128K

128K

1.5s

Preview

Agent Prime (agent-prime)

≥ 256K

128K

5.0s

Preview

Agent Max (agent-max)

≥ 512K

128K

6.0s

Preview

Agent Standard (agent-standard)

≥ 128K

128K

2.0s

Preview

You pass the instrument string in the model parameter on either the OpenAI Chat Completions or Anthropic Messages beta surface. The order book routes the request to a qualifying supplier offering the best price.

Text instruments

Three tiers covering general-purpose text generation, reasoning, summarization, and retrieval-augmented work. Text instruments are live today.

Text Prime (default)

  • Production default for everyday text generation. Strong reasoning at a fraction of frontier cost. Most workloads belong here.

  • Use for: RAG and retrieval-augmented apps, content drafting, summarization, customer-facing generation where quality matters but frontier reasoning is overkill.

  • Not sure where to start? Start on text-prime and move up or down from there.

Text Max

  • For high-stakes work where errors compound. The 1M context window handles long-context synthesis across many documents in a single pass.

  • Use for: legal review, clinical reasoning, financial analysis, security incident triage.

  • Route by exception, not by default. If you do not need the full context window or the highest Quality Score floor, you are paying for headroom that goes unused.

Text Standard

  • For high-volume, low-stakes work where speed and unit economics matter more than reasoning depth.

  • Use for: classification and routing, chatbot first-touch, batch summarization, pipeline glue.

  • Latency advantage. The 1.32 second TTFT cap keeps interactive responses feeling instant; the lower Quality Score floor keeps per-call costs manageable at volume.

  • Tricky calls? Send them up to Prime when they hit.

Code instruments (Preview)

Three tiers covering software engineering work, evaluated on real coding tasks, not chat.

Code Prime (default)

  • Production default for daily coding work. Fast enough for interactive use, smart enough for non-trivial tasks. Most production coding agents land here.

  • Use for: code completion in your IDE, "write this function" requests, automated PR review and feedback, standard debugging on stack traces, feature implementation where correctness is binary (the tests pass or they do not).

Code Max

  • For changes that span the whole codebase. The 1M context window and higher Coding Index floor handle codebases that overwhelm smaller models.

  • Use for: full-repo analysis, cross-repo migrations across multiple services and API boundaries, complex multi-layer debugging (race conditions, distributed system failures, subtle type mismatches), multi-file refactors touching thousands of lines.

Code Standard

  • For autocomplete, linting, and batch edits. High frequency, well-defined tasks where TTFT under 1.5 seconds and high throughput dominate.

  • Use for: inline suggestions firing on every keystroke, automated style enforcement, rename-in-200-files refactors, API call signature updates, boilerplate generation.

  • Reasoning depth is not the constraint here.

Agent instruments (Preview)

Three tiers covering autonomous and multi-step tool use, benchmarked on real agentic work.

Agent Prime (default)

  • Production default for daily agent work.

  • Use for: production agent loops running all day (customer service, data processing, monitoring), multi-step tool chains of 5 to 20 calls (database lookup, response formatting, ticket update), chain-of-thought reasoning workflows where the agent thinks through options before acting.

  • Common pattern: pair agent-prime for planning and tool use with code-prime for the actual code-writing.

Agent Max

  • For autonomous work measured in hours, not minutes. The higher Agentic Index floor and long-horizon validation matter more than per-call cost at this tier.

  • Use for: independent research that reads papers, searches the web, and produces a structured report after hours of unattended work; deep tool chains of 50+ calls where each step depends on the last; long-horizon infrastructure management, data pipeline construction, high-stakes automation acting on production systems.

Agent Standard

  • For fast tool calls and orchestration glue. The lightweight layer in an agent system that parses responses, decides which tool to call next, and formats inputs (these calls fire hundreds of times per agent run).

  • Use for: single-purpose agents that do one thing repeatedly, high-throughput orchestration where hundreds or thousands of agent instances run in parallel, routing and triage agents that decide whether a request handles directly or escalates.

How to choose

  • The biggest savings come from routing across tiers, not from picking one.

  • Common production setup: Standard for triage and orchestration glue, Prime for the calls that actually need to reason, Max only when the context is huge or the cost of an error is high.

  • Mix task types freely. An agent on agent-prime for planning, code-prime for implementation, text-standard for the run summary.

For the rules that govern what each Quality Score floor means, see How we define instruments. For the cadence on threshold changes, see How instrument specifications evolve.

Last updated

Was this helpful?