Current Instruments : Chat Prime & Chat Fast

The initial Text-to-Text Instruments available on The Grid are:

Chat Fast: Optimized for speed and throughput.
- Very low time to first token.
- High streaming tokens per second.
- Designed for short to medium outputs.
- Intelligence floor that is good enough for many production workloads.
Chat Prime: Optimized for quality and long form coherence.
- Higher minimum intelligence benchmark score.
- Larger maximum output size.
- Accepts somewhat slower time to first token.
- Designed for deep reasoning, long context, and complex tool use.

Choose Chat Fast when:

You have tight latency budgets and need instant feeling UX.
You run many parallel calls where throughput and cost per token matter more than the last bit of reasoning quality.
Your prompts are short and you expect brief to moderate outputs.
You are doing routing, summarization, classification, or other transforms rather than complex planning.

Typical use cases:

Support chat assistants, help center bots, Slack and Discord helpers.
Email, meeting, and thread summarization, code diff explanations, document summaries.
RAG answers where context is already tight and responses should be concise.
Bulk text transforms such as classify, redact, extract fields, paraphrase, or translate short snippets.
Product surfaces that need instant feedback like autocomplete, inline suggestions, or hinting.

Last updated 1 month ago

Was this helpful?