Streaming Responses with Claude API in Python (2026)

Originally published at kalyna.pro

Streaming sends Claude's response token by token as it's generated, instead of waiting for the full completion before showing anything. For a chat UI this is the difference between a user staring at a spinner for several seconds and seeing the first words appear within a few hundred milliseconds. The Claude API Tutorial introduces the basic stream.text_stream helper — this guide covers the full picture: the raw event stream, async streaming, error handling, and a complete FastAPI endpoint that streams Claude's output to a browser.

Prerequisites

pip install anthropic
# for the API endpoint example later:
pip install fastapi uvicorn

The Simple Way: text_stream

from anthropic import Anthropic

client = Anthropic()

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a haiku about debugging."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

    final_message = stream.get_final_message()

print(f"\n\nstop_reason: {final_message.stop_reason}")
print(f"output tokens: {final_message.usage.output_tokens}")

stream.get_final_message() returns the same Message object you'd get from a non-streaming call — complete content, stop_reason, and usage — without manually reassembling it from chunks.

The Raw Event Stream

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a haiku about debugging."}],
) as stream:
    for event in stream:
        print(event.type)

Event types, in order:

message_start — initial Message shell with usage.input_tokens
content_block_start — a new content block begins (text, tool_use, etc.)
content_block_delta — incremental content: text_delta (.text), input_json_delta (.partial_json, for tool inputs), or thinking_delta
content_block_stop — the block is complete
message_delta — stop_reason and updated usage.output_tokens
message_stop — stream finished

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a haiku about debugging."}],
) as stream:
    for event in stream:
        if event.type == "content_block_delta" and event.delta.type == "text_delta":
            print(event.delta.text, end="", flush=True)
        elif event.type == "message_delta":
            print(f"\n[tokens so far: {event.usage.output_tokens}]", end="")

Async Streaming

import asyncio
from anthropic import AsyncAnthropic

client = AsyncAnthropic()


async def main():
    async with client.messages.stream(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Write a haiku about debugging."}],
    ) as stream:
        async for text in stream.text_stream:
            print(text, end="", flush=True)


asyncio.run(main())

Building a Streaming API Endpoint (FastAPI + SSE)

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from anthropic import AsyncAnthropic

app = FastAPI()
client = AsyncAnthropic()


@app.get("/chat")
async def chat(message: str):
    async def event_stream():
        async with client.messages.stream(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            messages=[{"role": "user", "content": message}],
        ) as stream:
            async for text in stream.text_stream:
                yield f"data: {text}\n\n"

        yield "event: done\ndata: {}\n\n"

    return StreamingResponse(
        event_stream(),
        media_type="text/event-stream",
        headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
    )

X-Accel-Buffering: no stops nginx from buffering the whole response — without it, "streaming" arrives in one burst at the end. On the frontend, read with fetch + a ReadableStream reader, or EventSource for GET endpoints.

Handling Errors and Interruptions

import anthropic

try:
    with client.messages.stream(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Write a haiku about debugging."}],
    ) as stream:
        for text in stream.text_stream:
            print(text, end="", flush=True)
except anthropic.APIConnectionError:
    print("\n[connection lost — showing partial response]")
except anthropic.RateLimitError:
    print("\n[rate limited — retry shortly]")
except anthropic.APIStatusError as e:
    print(f"\n[API error {e.status_code}]")

If the client disconnects mid-response, exit the generator early so the SDK closes the stream — this stops billing for output tokens generated into the void. For long generations, check await request.is_disconnected() periodically and break if true.

Streaming with Tool Use

Text still arrives via text_delta, tool arguments arrive incrementally via input_json_delta, and stream.get_final_message() gives fully-parsed tool_use blocks once the stream ends. See Claude API Function Calling for the complete tool-use loop — it works unchanged whether calls are streamed or not.

Best Practices

Use get_final_message() for stop_reason/usage instead of accumulating message_delta manually
Use AsyncAnthropic in web backends — a sync stream blocks the event loop
Set Cache-Control: no-cache and X-Accel-Buffering: no for SSE behind a proxy
Detect client disconnects and stop generation early
Streaming doesn't change pricing — tokens are billed the same either way
Handle APIConnectionError, RateLimitError, and APIStatusError explicitly

Summary

stream.text_stream yields plain text chunks for display
Raw events: message_start, content_block_start, content_block_delta, content_block_stop, message_delta, message_stop
get_final_message() returns the complete Message after streaming
AsyncAnthropic + async with/async for for non-blocking backends
FastAPI StreamingResponse + async generator → SSE to the browser
Tool use streams the same way; input_json_delta carries tool arguments