# Current instruments

## Instrument are Specifications, not a Model

You buy a quality floor, not a model name. Each instrument is defined by measurable thresholds: Quality Score, latency, context, output length, sustained throughput, uptime. Any model from any supplier that clears every threshold can fill orders for that instrument. The order book routes each request to the qualifying supplier offering the best price at that moment.

You write `model: "text-prime"` (or any other instrument string) in the request and call either the OpenAI Chat Completions endpoint at `https://api.thegrid.ai/v1` or the Anthropic Messages beta at `https://messages-beta.api.thegrid.ai/v1`. The same instrument string works on both. The specific model behind any given call can change between calls. You contracted for the specification, not the model name.

## At-a-glance comparison

<details>

<summary>How <strong>Quality Score</strong>, <strong>Context</strong>, <strong>Output</strong>, <strong>TTFT</strong>, and <strong>Throughput</strong> are defined</summary>

{% hint style="info" %}
Each instrument can be served by many different models, so its specs define the "lowest acceptable bar" the supplied model must meet. Some specs are floors (e.g., Context) that models may exceed; others are ceilings (e.g., TTFT) that models may come in under.
{% endhint %}

**Quality Score** is the minimum score a model must achieve on the associated [Artificial Analysis Index](https://artificialanalysis.ai/) (Intelligence for text instruments, Coding for code instruments, Agentic for agent instruments).

**Context** represents the minimum context window (input length), in tokens, a model must support.

**Output** represents the maximum output length, in tokens, a model must support. Effectively the "minimum max output length."

**Time to First Token** aka TTFT is the time it takes an inference provider to deliver the first byte of a streaming response. Measured within our infrastructure, so it excludes network hops to the client.

**Throughput** is the rate at which tokens are generated after the first token, in tokens per second, during a streaming request. Also measured within our infrastructure.

</details>

<table data-view="cards"><thead><tr><th></th><th>Quality Score</th><th>Context</th><th>Output</th><th>Time to First Token</th><th>Throughput</th><th>Status</th></tr></thead><tbody><tr><td><h4>Text Standard</h4></td><td>≥ 18 <a href="https://artificialanalysis.ai/"><mark style="color:$info;">Intelligence Index</mark></a></td><td>≥ 128K <mark style="color:$info;">tokens</mark></td><td>≥ 16K <mark style="color:$info;">tokens</mark></td><td>≤ 1.32 <mark style="color:$info;">seconds</mark></td><td>≥ 100 <mark style="color:$info;">tok / sec</mark></td><td>Live</td></tr><tr><td><h4>Text Prime</h4></td><td>≥ 38 <a href="https://artificialanalysis.ai/"><mark style="color:$info;">Intelligence Index</mark></a></td><td>≥ 128K <mark style="color:$info;">tokens</mark></td><td>≥ 30K <mark style="color:$info;">tokens</mark></td><td>≤ 4.62 <mark style="color:$info;">seconds</mark></td><td>≥ 40 <mark style="color:$info;">tok / sec</mark></td><td>Live</td></tr><tr><td><h4><strong>Text Max</strong></h4></td><td>≥ 53 <a href="https://artificialanalysis.ai/"><mark style="color:$info;">Intelligence Index</mark></a></td><td>≥ 1M <mark style="color:$info;">tokens</mark></td><td>≥ 128K <mark style="color:$info;">tokens</mark></td><td>≤ 3.50 <mark style="color:$info;">seconds</mark></td><td>≥ 30 <mark style="color:$info;">tok / sec</mark></td><td>Live</td></tr><tr><td><h4>Code Standard</h4></td><td>≥ 20 <a href="https://artificialanalysis.ai/"><mark style="color:$info;">Coding Index</mark></a></td><td>≥ 128K <mark style="color:$info;">tokens</mark></td><td>≥ 16K <mark style="color:$info;">tokens</mark></td><td>≤ 2.0 <mark style="color:$info;">seconds</mark></td><td>≥ 50 <mark style="color:$info;">tok / sec</mark></td><td>Preview</td></tr><tr><td><h4>Code Prime</h4></td><td>≥ 35 <a href="https://artificialanalysis.ai/"><mark style="color:$info;">Coding Index</mark></a></td><td>≥ 128K <mark style="color:$info;">tokens</mark></td><td>≥ 64K <mark style="color:$info;">tokens</mark></td><td>≤ 3.0 <mark style="color:$info;">seconds</mark></td><td>≥ 35 <mark style="color:$info;">tok / sec</mark></td><td>Preview</td></tr><tr><td><h4>Code Max </h4></td><td>≥ 48 <a href="https://artificialanalysis.ai/"><mark style="color:$info;">Coding Index</mark></a></td><td>≥ 1M <mark style="color:$info;">tokens</mark></td><td>≥ 128K <mark style="color:$info;">tokens</mark></td><td>≤ 3.5 <mark style="color:$info;">seconds</mark></td><td>≥ 25 <mark style="color:$info;">tok / sec</mark></td><td>Preview</td></tr><tr><td><h4><strong>Agent Standard</strong></h4></td><td>≥ 18 <a href="https://artificialanalysis.ai/"><mark style="color:$info;">Agentic Index</mark></a></td><td>≥ 128K <mark style="color:$info;">tokens</mark></td><td>≥ 16K <mark style="color:$info;">tokens</mark></td><td>≤ 2.0 <mark style="color:$info;">seconds</mark></td><td>≥ 50 <mark style="color:$info;">tok / sec</mark></td><td>Preview</td></tr><tr><td><h4><strong>Agent Prime</strong></h4></td><td>≥ 55 <a href="https://artificialanalysis.ai/"><mark style="color:$info;">Agentic Index</mark></a></td><td>≥ 128K <mark style="color:$info;">tokens</mark></td><td>≥ 64K <mark style="color:$info;">tokens</mark></td><td>≤ 3.0 <mark style="color:$info;">seconds</mark></td><td>≥ 35 <mark style="color:$info;">tok / sec</mark></td><td>Preview</td></tr><tr><td><h4><strong>Agent Max</strong></h4></td><td>≥ 67 <a href="https://artificialanalysis.ai/"><mark style="color:$info;">Agentic Index</mark></a></td><td>≥ 1M <mark style="color:$info;">tokens</mark></td><td>≥ 128K <mark style="color:$info;">tokens</mark></td><td>≤ 3.5 <mark style="color:$info;">seconds</mark></td><td>≥ 25 <mark style="color:$info;">tok / sec</mark></td><td>Preview</td></tr></tbody></table>

To call each of these instruments via the [Consumption API](/docs/api-reference/consumption-api.md), you pass the instrument string in the `model` parameter (e.g `text-prime` ). The order book routes the request to a qualifying supplier offering the best price.

## Text instruments

Three tiers covering general-purpose text generation, reasoning, summarization, and retrieval-augmented work. Text instruments are live today.

### Text Prime (default)

* **Production default for everyday text generation.** Strong reasoning at a fraction of frontier cost. Most workloads belong here.
* **Use for:** RAG and retrieval-augmented apps, content drafting, summarization, customer-facing generation where quality matters but frontier reasoning is overkill.
* **Not sure where to start?** Start on `text-prime` and move up or down from there.

### Text Max

* **For high-stakes work where errors compound.** The 1M context window handles long-context synthesis across many documents in a single pass.
* **Use for:** legal review, clinical reasoning, financial analysis, security incident triage.
* **Route by exception, not by default.** If you do not need the full context window or the highest Quality Score floor, you are paying for headroom that goes unused.

### Text Standard

* **For high-volume, low-stakes work** where speed and unit economics matter more than reasoning depth.
* **Use for:** classification and routing, chatbot first-touch, batch summarization, pipeline glue.
* **Latency advantage.** The 1.32 second TTFT cap keeps interactive responses feeling instant; the lower Quality Score floor keeps per-call costs manageable at volume.
* **Tricky calls?** Send them up to Prime when they hit.

## Code instruments (Preview)

Three tiers covering software engineering work, evaluated on real coding tasks, not chat.

### Code Prime (default)

* **Production default for daily coding work.** Fast enough for interactive use, smart enough for non-trivial tasks. Most production coding agents land here.
* **Use for:** code completion in your IDE, "write this function" requests, automated PR review and feedback, standard debugging on stack traces, feature implementation where correctness is binary (the tests pass or they do not).

### Code Max

* **For changes that span the whole codebase.** The 1M context window and higher Coding Index floor handle codebases that overwhelm smaller models.
* **Use for:** full-repo analysis, cross-repo migrations across multiple services and API boundaries, complex multi-layer debugging (race conditions, distributed system failures, subtle type mismatches), multi-file refactors touching thousands of lines.

### Code Standard

* **For autocomplete, linting, and batch edits.** High frequency, well-defined tasks where TTFT under 1.5 seconds and high throughput dominate.
* **Use for:** inline suggestions firing on every keystroke, automated style enforcement, rename-in-200-files refactors, API call signature updates, boilerplate generation.
* **Reasoning depth is not the constraint here.**

## Agent instruments (Preview)

Three tiers covering autonomous and multi-step tool use, benchmarked on real agentic work.

### Agent Prime (default)

* **Production default for daily agent work.**
* **Use for:** production agent loops running all day (customer service, data processing, monitoring), multi-step tool chains of 5 to 20 calls (database lookup, response formatting, ticket update), chain-of-thought reasoning workflows where the agent thinks through options before acting.
* **Common pattern:** pair `agent-prime` for planning and tool use with `code-prime` for the actual code-writing.

### Agent Max

* **For autonomous work measured in hours, not minutes.** The higher Agentic Index floor and long-horizon validation matter more than per-call cost at this tier.
* **Use for:** independent research that reads papers, searches the web, and produces a structured report after hours of unattended work; deep tool chains of 50+ calls where each step depends on the last; long-horizon infrastructure management, data pipeline construction, high-stakes automation acting on production systems.

### Agent Standard

* **For fast tool calls and orchestration glue.** The lightweight layer in an agent system that parses responses, decides which tool to call next, and formats inputs (these calls fire hundreds of times per agent run).
* **Use for:** single-purpose agents that do one thing repeatedly, high-throughput orchestration where hundreds or thousands of agent instances run in parallel, routing and triage agents that decide whether a request handles directly or escalates.

## How to choose

* **The biggest savings come from routing across tiers, not from picking one.**
* **Common production setup:** Standard for triage and orchestration glue, Prime for the calls that actually need to reason, Max only when the context is huge or the cost of an error is high.
* **Mix task types freely.** An agent on `agent-prime` for planning, `code-prime` for implementation, `text-standard` for the run summary.

For more details on how we revise our specifications and qualifying criteria for suppliers, see [How instrument specifications evolve](https://thegrid.ai/docs/instrument-specifications/how-specifications-evolve.md).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://thegrid.ai/docs/instrument-specifications/current-instruments.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
