# Benefits of consuming market priced inference

Your team pays rate-card on a single vendor for inference that other qualifying suppliers sell cheaper. We run a live market where multiple suppliers compete to serve every request, and you buy a quality tier instead of a specific model. Same code, lower cost, no quality compromise.

Three things to know upfront:

1. Prices are lower because they're set by live supply and demand, not a fixed rate card.
2. A real market sits underneath every request, with multiple suppliers competing to fill any qualifying offer.
3. Most teams are overpaying right now without realizing it. We surface and close that gap automatically.

***

## 1. Lower token costs, set by live supply and demand

You want lower token costs. We deliver them by replacing fixed rate cards with a live market. Suppliers post prices, the cheapest qualifying offer wins, and you pay the clearing price at that moment. When supply grows or a new supplier enters, the price falls and you capture the drop without changing a line of code.

The benefit shows up on the first request. Nothing to provision, no annual commitment, no minimum spend. Live prices for every instrument are published at [thegrid.ai/pricing](https://thegrid.ai/pricing) and update continuously. Every buyer sees the same book. No hidden quote layer, no separate enterprise rate card.

***

## 2. A real market sits beneath every request

This isn't a router with a single price feed and a clever fallback. We run a continuous limit order book where model labs, infrastructure suppliers, capacity aggregators, and reseller networks all post offers on the same instrument. We match every request to the best qualifying offer at the moment you call.

When a cheaper supplier enters, the price drops for everyone. When one drifts below specification, we remove them from the eligible set. When a new model qualifies, it joins the pool automatically. Your code does nothing. The market does the work. Pricing power lives with the market, not with any single vendor.

***

## 3. Most teams overpay on inference without realizing it

Your team pays sticker price on one vendor for a workload that two or three qualifying suppliers would serve cheaper. Your team also runs a frontier-tier model on traffic that a smaller qualifying model would handle just as well. Both gaps are invisible on the invoice. The line item just says "API usage."

We close both at once. Every request routes to the cheapest qualifying offer at the tier you chose, and the overpayment goes away the moment you switch your `base_url`. For a number on your own workload first, run the [savings analysis prompt](/docs/start-here/quickstart.md#want-a-quick-savings-estimate) against your last 30 days of usage. Most teams are surprised by the size of the gap, not by its existence.

***

## 4. Total cost drops across unit price, tier selection, and operations

Three forces compound to lower your TCO:

1. Tier-pick the right quality. Run `text-prime` for everyday production, `text-max` for hard tasks where correctness or context size matters, `text-standard` for high-volume classification and pipelines. Honest tier selection alone moves spend by a meaningful margin.
2. Competitive market pricing. Suppliers compete on price for every qualifying offer. Rate cards have no such pressure, so the clearing price trends below what any single vendor lists for the same model.
3. Transparent metering, no provisioning. You pay per token consumed. No reserved-throughput contract, no minimum commitment, and none of the operational cost from evaluating each new model release or maintaining a fallback strategy.

The first cuts unit cost where you don't need a frontier model. The second cuts unit cost where you do. The third cuts the operational cost no one bothers to measure but everyone pays.

***

## 5. Quality enforced against independent benchmarks

Quality on The Grid is a measured threshold on benchmarks the industry already trusts. Each instrument has a Quality Score with a per-task-type threshold: [Intelligence Index](https://artificialanalysis.ai/) for Text instruments, [Coding Index](https://artificialanalysis.ai/) for Code instruments, [Agentic Index](https://artificialanalysis.ai/) for Agent instruments. Latency, time to first token, throughput, context window, and uptime thresholds apply to every instrument.

Models qualify by clearing every threshold, and they qualify per provider, because the same model can perform differently depending on how it's served. We continuously audit live traffic against the specification, with financial penalties for suppliers who drift. The eligible model list per instrument is curated, the audit runs continuously, and the thresholds tighten as frontier capability advances. "Cheapest qualifying offer" is a guarantee, not a marketing phrase.

For full thresholds and qualifying model lists, see [Benchmarks and quality](broken://pages/vkIUz0oAoO4bQ5EqsE1G), [how instruments are defined](https://github.com/Spectral-Finance/GitbookDocs/blob/main/instrument-specifications/how-instruments-are-defined.md), and the [current instruments](/docs/instrument-specifications/current-instruments.md).

***

## 6. Zero vendor lock-in

Lock-in usually comes from vendor SDKs, vendor-specific model names in application code, and deprecation cycles on the vendor's timeline. None of that exists here. We use an OpenAI-compatible API and a standardized instrument abstraction. `text-prime` is `text-prime` whether the underlying model is GPT, Claude, Gemini, GLM, or whatever qualifies next quarter. The response shape, SDK, auth, streaming, tool-calling, and structured-output semantics all stay the same.

When a new model qualifies, it joins the eligible pool automatically. When an old one drifts or gets deprecated upstream, we remove it. Your code never touches a model name, so there's nothing to migrate. The practical test: can you switch the underlying model serving production traffic, without touching application code or running a deploy? Here, yes.

If you ever leave, the integration is just an OpenAI-compatible base URL and key. Point your client at any other compatible service and your code keeps working. Lock-in goes both directions, which is the only credible test of its absence.

***

## 7. One API, one bill, one key

One Consumption API key works across every instrument and every supplier on both surfaces: the OpenAI-compatible endpoint at `https://api.thegrid.ai/v1` (Bearer auth) and the Anthropic-compatible Messages endpoint at `https://messages-beta.api.thegrid.ai/v1` in beta (`x-api-key` auth). One Stripe-backed bill covers all of it.

You don't manage separate accounts with OpenAI, Anthropic, Google, Together, Fireworks, and Groq. You don't reconcile six invoices, juggle six rate-limit schemes, or onboard a new vendor every time a better price shows up. For finance, that's one line item to forecast and one counterparty to invoice. For engineering, one integration that scales as the supplier set grows.

Spend caps, alerts, and balance thresholds are configurable per account. The same key handles both inference and trading. Auto-Reload keeps your balance topped up so traffic never gets interrupted. See [Auto-Reload](broken://pages/pdlcsdUZJSSSfhsYQcIu) for the configuration.

***

## 8. Every request is metered at the token level and tied to a specific trade

Every request is metered at the token level and attributed to a specific instrument, supplier, and trade. Your dashboard shows what you spent, on which instrument, served by which supplier, at what price per million tokens, on what date. No opaque "usage" line items, no surprise overage, no per-feature add-ons.

Every response includes the instrument, the model that served it, the supplier, the latency, and the token counts, so you can reconcile any line on the invoice back to the exact trade. The dashboard breaks out spend by instrument and supplier, average price per million tokens, latency distributions, and a trade log with one row per request. The same data is available through the Consumption API for teams pushing it into chargeback or analytics.

Pricing is published, not negotiated. Every buyer sees the same number. No opaque enterprise rates, no separate quote process for high-volume accounts.

***

## Where to go next

* [Quickstart](/docs/start-here/quickstart.md): first call in five minutes.
* [Choose an instrument](/docs/start-here/choose-an-instrument.md): pick the right tier for your workload.
* [Concepts](/docs/concepts/order-book-and-matching.md): order book mechanics, matching, and clearing.
* [Migrating from OpenAI](https://github.com/Spectral-Finance/GitbookDocs/blob/main/migration-and-best-practices/migrating-from-openai.md): three lines of code, a few minutes of testing.
* [Instrument specifications](https://github.com/Spectral-Finance/GitbookDocs/blob/main/instrument-specifications/how-instruments-are-defined.md): full benchmark thresholds and qualifying model lists.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://thegrid.ai/docs/benefits/benefits.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
