Streaming Responses

For real-time token generation, set stream: true in your request. The response will be delivered as Server-Sent Events (SSE).

Request:

{
  "model": "text-prime",
  "messages": [{"role": "user", "content": "Hello!"}],
  "stream": true
}

Response Format:

Each SSE event contains a data: field with a JSON chunk:

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1677858242,"model":"text-prime","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1677858242,"model":"text-prime","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1677858242,"model":"text-prime","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1677858242,"model":"text-prime","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]
circle-info

Parsing tips:

  • Each chunk is prefixed with data:

  • The stream ends with data: [DONE]

  • Collect delta.content from each chunk to build the full response

  • Check finish_reason to detect completion (stop, length, tool_calls)

Last updated

Was this helpful?