# Instruments as commodities

Every commodity market starts with the same definition problem: before anything can trade, the product has to be specified precisely enough that a buyer in one place can transact with a seller in another without inspecting the goods. Wheat solved it with grades, natural gas with pipeline specifications, crude oil with WTI and Brent. The specification abstracts the producer away. A bushel of Corn Grade #2 Yellow is a bushel of Corn Grade #2 Yellow regardless of which farm grew it.

We do the same thing for inference. An instrument is the specification, the contract that says "this is what you're buying," and units are the deliverable. A unit is one million tokens of inference output that meets the instrument's specification. Any model from any supplier that clears every threshold is eligible to fill an order. Which model actually serves your request is an implementation detail.

## Standardization unlocks price competition

A market needs fungibility before it can have prices. If every supplier sells something slightly different, every transaction is a one-off negotiation: rate cards, sales reps, lock-in. Once the specification is written and audited, qualifying suppliers are interchangeable from your perspective. The only thing left to compete on is price (and reliability, which the specification also enforces).

The number of suppliers serving any given frontier-class model is already in the dozens. Quality and price differ across suppliers running the same weights because serving stack, hardware, batching, and geography all matter. Without a shared specification that variation is your problem; with one, it becomes the market's problem and you just buy.

## Model identity, parameter count, and vendor each fail as the unit of standardization

The interesting question is which dimension to standardize on. The choices that come to mind first turn out to be the wrong ones.

* **Model identity.** A separate market for Claude, another for GPT, another for Gemini. The same model served by different suppliers behaves differently (quantization, GPU architecture, batching, geography), and every model endpoint ships with a deprecation date, so an instrument that retires every few months can never accumulate the liquidity a market needs. Commodity markets figured this out long ago: the CME corn contract isn't "corn from Farm X."
* **Parameter count.** 7B, 70B, 400B. Once a useful proxy for quality, now broken by distillation and mixture-of-experts. Size is a production input, not an output guarantee.
* **Vendor name.** Same problem as model identity, plus a coordination problem: a market organized around vendors gives those vendors veto power over what the market looks like.
* **Industry vertical.** Healthcare, legal, finance. Strong candidates for future expansion, but the supply base isn't differentiated enough yet. The models on domain-specific leaderboards are still overwhelmingly the same general-purpose frontier models.

What works is benchmark-defined performance. The instrument specifies measurable thresholds across a small set of output parameters. Any model that clears every threshold qualifies, regardless of who built it, how big it is, or how it was trained. The producer is invisible by design.

## Tiers split each task type by the quality-speed-cost tradeoff

Each task type (text, code, agent) splits into three tiers. The same governing parameters apply across every tier; the floors differ. The full list of live instruments is on [Current instruments](/docs/instrument-specifications/current-instruments.md).

* **Max** is the frontier slot. Highest [Quality Score](https://artificialanalysis.ai/) floor, longest context, broadest task-specific evaluations. Use it when getting the wrong answer is expensive: deep reasoning, complex agent loops, long-context analysis.
* **Prime** is the production default. Strong reasoning, balanced throughput and latency. Most workloads belong here.
* **Standard** is the high-throughput tier. Lower Quality Score floor, faster TTFT, costs that hold up at volume.

A single tier per task type would force you to pay for frontier capacity even on workloads that don't need it. Three tiers per task type cover the actual range of how inference gets used.

## Routing across instruments instead of routing across models

Take a code-review pipeline with three steps: a fast classifier that decides whether a PR is risky, a deeper analysis pass on the risky ones, a final summary. In a single-supplier setup you'd pick one model and run everything through it, paying frontier rates for the classifier even though it's wildly overspec.

With us, you target three instruments. The classifier hits `text-standard`. The analysis pass hits `text-prime` or `code-prime`. The summary hits `text-prime`. Each step is fulfilled by whichever qualifying supplier currently offers the best price on that instrument, possibly different suppliers for different steps in the same pipeline, with no special routing on your side.

## The specification is a floor, not a ceiling

The specification defines a minimum, not identical behavior. Two suppliers serving Text Prime both clear the same thresholds, but they have somewhat different output styles, sampling defaults, and edge-case handling. The benchmarks measure generic capability; if your workload stresses something beyond what they capture, application-level evals on top are worth running. The specification also evolves: as the frontier moves, thresholds get raised so each tier keeps its meaning. The instrument string is stable; the qualifying models and suppliers underneath rotate.

## What changes for you

You stop tracking model release dates. The instrument string you target is stable, and qualifying models are added or removed as the spec evolves. You stop maintaining supplier-specific routing logic. Transparent failover handles outages on the supplier side, and Auto Mode handles purchasing. You think in tiers, not models.

What stays the same: the API surface. OpenAI-compatible endpoints (or the Anthropic Messages beta), the SDK you already use, the request and response formats. The conceptual shift is real; the integration shift is one parameter and a base URL.

## Where this connects

* [How we define instruments](broken://pages/sfxozGnwuVs8kBcTbFhO), the governing parameters, the [Quality Score](https://artificialanalysis.ai/) methodology, and how qualifying suppliers are admitted and audited.
* [Current instruments](/docs/instrument-specifications/current-instruments.md), full thresholds and qualifying models for every live instrument.
* [Order book and matching](/docs/concepts/order-book-and-matching.md), how the market mechanism turns the specification into a price.
* [Benefits](/docs/benefits/benefits.md), what this means for buyers compared to the existing alternatives.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://thegrid.ai/docs/concepts/instruments-as-commodities.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
