> For the complete documentation index, see [llms.txt](https://thegrid.ai/docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://thegrid.ai/docs/integrations-and-best-practices/integrations/litellm.md).

# LiteLLM Integration with The Grid | SDK and Proxy Setup

LiteLLM normalizes 100+ LLM providers to the OpenAI format. The Grid plugs in as an OpenAI-compatible backend. You can call The Grid directly from the `litellm` Python SDK, or run the LiteLLM proxy in front of The Grid so any downstream OpenAI-compatible tool routes through a single endpoint.

* **Quick version.** Prefix Grid models with `openai/` so LiteLLM uses its OpenAI-compatible client. Set `api_base: https://api.thegrid.ai/v1` and `api_key: os.environ/THEGRID_API_KEY`. For the proxy, run `uv run litellm --config config.yaml`.

For the full instrument catalog, see [Current instruments](/docs/instrument-specifications/current-instruments.md).

## Two patterns

* **SDK.** Your Python app calls `litellm.completion(...)` with The Grid as a backend. LiteLLM handles retries, fallbacks, and response normalization.
* **Proxy.** Run `uv run litellm --config config.yaml` as a standalone server. All your agents, IDEs, and scripts point at the proxy. The proxy fans out to The Grid (and optionally OpenAI, Anthropic, local models) from one `model_list`.

In both patterns, The Grid is the upstream backend. The `openai/` prefix on `model` selects LiteLLM's OpenAI-compatible client; `api_base` does the rest.

## Prerequisites

* Python 3.8+ and `uv add "litellm[proxy]"` (the `[proxy]` extras cover both SDK and proxy).
* A Grid account at [app.thegrid.ai](https://app.thegrid.ai) with funded credits.
* A consumption API key from your dashboard, exported as `THEGRID_API_KEY`.

## Pattern 1: LiteLLM SDK

{% tabs %}
{% tab title="Python" %}

```python
import os
import litellm

response = litellm.completion(
    model="openai/agent-prime",
    api_base="https://api.thegrid.ai/v1",
    api_key=os.environ["THEGRID_API_KEY"],
    messages=[{"role": "user", "content": "Plan a three-step research task."}],
)

print(response.choices[0].message.content)
```

{% endtab %}
{% endtabs %}

The `openai/` prefix is required. Streaming, tool calling, and structured outputs work the same way as any other OpenAI-compatible call (`stream=True`, `tools=[...]`, `response_format=...`).

## Pattern 2: LiteLLM Proxy

Run LiteLLM as a gateway. Any OpenAI-compatible client points at `http://localhost:4000`.

### 1. Install and set keys

```bash
uv add "litellm[proxy]"
export THEGRID_API_KEY=<your-grid-consumption-key>
export LITELLM_MASTER_KEY=sk-1234   # any string; downstream clients send this as Bearer
```

### 2. Write `config.yaml`

```yaml
model_list:
  - model_name: grid/text-prime
    litellm_params:
      model: openai/text-prime
      api_base: https://api.thegrid.ai/v1
      api_key: os.environ/THEGRID_API_KEY
    model_info:
      max_input_tokens: 120000
      max_output_tokens: 32000
  - model_name: grid/code-prime
    litellm_params:
      model: openai/code-prime
      api_base: https://api.thegrid.ai/v1
      api_key: os.environ/THEGRID_API_KEY
    model_info:
      max_input_tokens: 120000
      max_output_tokens: 32000
  - model_name: grid/agent-prime
    litellm_params:
      model: openai/agent-prime
      api_base: https://api.thegrid.ai/v1
      api_key: os.environ/THEGRID_API_KEY
    model_info:
      max_input_tokens: 120000
      max_output_tokens: 32000
  # Add the other six instruments the same way; see Current instruments.
  # On the Max tiers (text-max, code-max, agent-max), set max_input_tokens: 922000.

litellm_settings:
  num_retries: 3
  request_timeout: 300
  drop_params: True

general_settings:
  master_key: os.environ/LITELLM_MASTER_KEY
```

Field-name gotchas: use `api_base` (not `base_url`), and `os.environ/NAME` (not `${NAME}`) for env-var references.

The `model_info` block sets the token limits LiteLLM exposes through `/model/info` and its cost map. Downstream tools that read those values (Aider, SWE-agent, OpenHands, CrewAI) inherit them, so an accurate `max_input_tokens` here stops them from packing a request that overruns the window. Use 120,000 for input on Standard and Prime instruments, 922,000 on Max, and keep `max_output_tokens` below the instrument's context window. The Standard and Prime instruments carry a 128K context window; the Max instruments carry 1M. Get current per-instrument limits from [Current instruments](/docs/instrument-specifications/current-instruments.md).

### 3. Start the proxy

```bash
uv run litellm --config config.yaml
# proxy runs on http://0.0.0.0:4000 by default
```

### 4. Verify with curl

```bash
curl http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $LITELLM_MASTER_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "grid/agent-prime", "messages": [{"role": "user", "content": "hello"}]}'
```

A 200 means LiteLLM accepted your master key, resolved `grid/agent-prime` to `openai/agent-prime` at `api.thegrid.ai/v1`, and returned a completion.

## Point downstream tools at the proxy

Once the proxy is up, any OpenAI-compatible tool (Cline, Continue, your Python scripts) can use it as the inference backend:

* **Base URL:** `http://localhost:4000` (or `http://localhost:4000/v1` depending on the client)
* **API key:** your `LITELLM_MASTER_KEY`
* **Model name:** any `model_name` from your `model_list` (e.g., `grid/code-prime`)

Add or swap a backend once in `config.yaml` and every downstream tool picks up the change on proxy restart.

## Stable aliases for multi-provider setups

If you want stable route names regardless of which backend fulfills them, alias them in `model_list`:

```yaml
model_list:
  - model_name: default
    litellm_params:
      model: openai/text-prime
      api_base: https://api.thegrid.ai/v1
      api_key: os.environ/THEGRID_API_KEY
  - model_name: openai-fallback
    litellm_params:
      model: openai/gpt-4o-mini
      api_key: os.environ/OPENAI_API_KEY

router_settings:
  fallbacks:
    - default: ["openai-fallback"]
```

Your app calls `model="default"` and never touches a vendor name. Swap the backend in one place.

## Troubleshooting

* **401 Unauthorized.** Verify the key is a consumption key (not a trading key), `THEGRID_API_KEY` is exported in the shell that started LiteLLM, and `api_base` is exactly `https://api.thegrid.ai/v1` with no trailing slash. Test the key with `curl https://api.thegrid.ai/v1/chat/completions -H "Authorization: Bearer $THEGRID_API_KEY" -d '...'`.
* **402 Payment Required.** Credits are empty. Top up at [app.thegrid.ai](https://app.thegrid.ai). LiteLLM retries up to `num_retries` then falls back if configured, but a 402 won't clear until you add funds.
* **"model not found" downstream.** The `model` your client sends must match a `model_name` in `model_list` exactly. `grid/code-prime` is not the same as `grid-code-prime`.
* **Cost tracking is approximate.** LiteLLM tracks cost from a static pricing table. The Grid is a market with live pricing; LiteLLM's numbers are a rough guide. Source of truth is your spend in the [app dashboard](https://app.thegrid.ai).


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://thegrid.ai/docs/integrations-and-best-practices/integrations/litellm.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.