Server-sent events

The HTTP streaming protocol most LLM APIs use to push partial completions to clients token-by-token over a single long-lived connection.

What is Server-sent events?

Server-sent events (SSE) is an HTTP streaming pattern that lets a server push incremental updates to a client over one long-lived connection. In LLM apps, it is commonly used to stream partial completions token by token as they are generated. (developer.mozilla.org)

Understanding Server-sent events

SSE is built around the browser's EventSource interface and a text/event-stream response format. The server sends framed messages over HTTP, and the client receives them as events without polling for new data. The HTML standard defines reconnect behavior and the Last-Event-ID mechanism, which helps clients resume streams after interruptions. (html.spec.whatwg.org)

In practice, SSE is a one-way channel from server to client. That makes it a natural fit for chat UIs, live progress updates, and streaming model output, especially when the application only needs to receive updates rather than send bidirectional messages. Because the protocol is simple and HTTP-native, teams often find it easier to deploy through standard web infrastructure than lower-level socket-based approaches. (developer.mozilla.org)

Key aspects of Server-sent events include:

One-way delivery: the server pushes updates to the client, which is ideal for streaming generated text and status updates.
HTTP-native transport: SSE rides over standard HTTP and uses a long-lived response instead of repeated polling.
Event framing: messages are formatted as named fields such as data and optional event values.
Automatic reconnection: clients can reconnect when the stream drops, and the protocol supports resuming with the last seen event ID.
Text-based payloads: SSE streams are UTF-8 encoded and work well for incremental text output.

Advantages of Server-sent events

Simple implementation: SSE is straightforward to add to existing HTTP backends.
Good fit for LLM streaming: partial completions can be displayed as they arrive.
Efficient updates: the client does not need to repeatedly poll for new data.
Built-in reconnect behavior: streams can recover from transient network issues.
Easy to observe: event boundaries make it easier to log and inspect incremental output.

Challenges in Server-sent events

One-way only: SSE is not designed for client-to-server messaging.
Browser and proxy behavior: long-lived connections can be affected by intermediaries and timeouts.
Text-oriented format: SSE is best for UTF-8 data, not arbitrary binary payloads.
Connection management: teams still need to handle retries, cancellations, and disconnects carefully.
State handling: resuming streams cleanly may require tracking event IDs and partial output state.

Example of Server-sent events in action

Scenario: a user opens a chat interface and asks an LLM a question.

The backend starts generating a response and sends each token as an SSE message. The browser listens with EventSource, appends each chunk to the transcript, and updates the UI in real time without waiting for the full answer.

If the connection briefly drops, the client reconnects and can continue from the last received event ID. That makes SSE a practical default for streaming assistants, live copilots, and any interface where progressive rendering improves the experience. (html.spec.whatwg.org)

How PromptLayer helps with Server-sent events

PromptLayer helps teams track, inspect, and evaluate the prompts that drive streamed LLM responses. When your app uses SSE to deliver tokens incrementally, we make it easier to trace request metadata, compare prompt versions, and review how partial or final outputs behave in production.

Ready to try it yourself? Sign up for PromptLayer and start managing your prompts in minutes.