Choose an instrument

Pick the right task type (Text, Code, or Agent) and tier (Standard, Prime, or Max) for your workload.

If you only read one line: start on Prime, reach up to Max for hard tasks, drop to Standard for high-volume work. Most teams never need anything else for the first month.

The three tiers

We publish each task type in three tiers. Tiers differ in their Quality Score floor and in the cost, latency, and context envelope suppliers fill at.

  • Prime is the production default. Strong reasoning, large context, priced to let you run agents and pipelines without watching the meter.

  • Max is for high-stakes work where errors compound: long-context synthesis, cross-repo code reasoning, long-horizon agent runs, regulated output.

  • Standard is for cheap, high-volume work: classification, autocomplete, triage, batch pipelines.

Each tier's Quality Score floor is anchored to the relevant Artificial Analysisarrow-up-right leaderboard: Intelligence Index for Text, Coding Index for Code, Agentic Index for Agent. For the current live and preview list with the actual floors, see Current instruments.

Pick a tier for your workload

Text

If your workload looks like this
Use

Agents, RAG, code generation, drafting, quality-sensitive work

text-prime

Very large context, multi-step reasoning, high cost of errors

text-max

Classification, chatbots, batch jobs, summarization, high-volume work

text-standard

Not sure yet

Start on text-prime, move up or down later

Code (preview)

If your workload looks like this
Use

Daily coding, code review, standard debugging, feature implementation

code-prime

Full codebase analysis, cross-repo migrations, multi-layer debugging

code-max

Autocomplete, linting, inline suggestions, batch edits

code-standard

Not sure yet

Start on code-prime, move up or down later

Agent (preview)

If your workload looks like this
Use

Production agent loops, multi-step tool use, reasoning workflows

agent-prime

Autonomous research, 50+ tool chains, long-horizon tasks (hours)

agent-max

Fast tool calls, simple agent loops, high-throughput orchestration

agent-standard

Not sure yet

Start on agent-prime, move up or down later

Default to Prime

Most production traffic on The Grid runs through Prime. If you're starting fresh and not sure where to land, start there. Reach for Prime when:

  • You're running an agent loop with 10 to 200 inference calls per task.

  • You're building a RAG app where retrieval does the heavy lifting and the model just synthesizes.

  • You're generating code that has to compile and pass tests.

  • You're producing content from a brief (drafts, structured output, marketing copy, replies).

Reach for Max only when errors compound

Max tiers are for work where the cost of getting it wrong is much larger than the cost of the inference call. Examples:

  • Long-context synthesis where you're packing hundreds of thousands of tokens into one prompt and the answer depends on connecting facts across all of it.

  • Architecture-level code reasoning, cross-repo migrations, or debugging across multiple services and runtimes.

  • Autonomous research or long-horizon agent runs that take hours and feed downstream work.

  • High-stakes professional output (legal, medical, financial) where a wrong answer has real consequences.

Route up by exception. If most of your calls go to Max, you're probably overpaying.

Drop to Standard for high-volume work

Standard tiers handle the high-volume, low-ambiguity work that makes up most of an application's traffic. Use Standard when:

  • You're classifying, tagging, or routing inbound data at scale.

  • First token has to arrive in under a second so the user feels the response is instant.

  • You're running batch pipelines where throughput and unit economics matter more than reasoning depth.

  • You're doing autocomplete, inline suggestions, or formatting work that fires on every keystroke.

A single app routing 80% of traffic to Standard and 20% to Prime is usually where the savings show up.

Mix tiers across one app

Most of the savings come from splitting one application across multiple instruments rather than sending everything to one tier.

  • Triage on Standard, reason on Prime. Classify inbound requests on text-standard, send the ones that need real reasoning to text-prime.

  • Retrieve on Standard, synthesize on Prime. Run retrieval and reranking on Standard, generate the final answer on Prime.

  • Code Standard for autocomplete, Code Prime for implementation.

  • Agent Standard for orchestration, Agent Prime for the calls where the agent actually has to reason.

  • Mix task types. A coding agent can use code-prime for writing code, agent-prime for planning, and text-standard for summarizing. Each instrument is independent.

Where to go next

Last updated

Was this helpful?