Building a production streaming chat UI: SSE, cancellation, and error recovery
A complete guide to Server-Sent Events for AI chat — buffer management, AbortController, reconnection, and the [DONE] contract.
Streaming is not performance optimization — it is a UX requirement. Users perceive a response as 'fast' when the first token appears under 500ms, even if the full response takes 10 seconds. A non-streaming 4-second response feels slower than a streaming 6-second response that shows the first word immediately.
The SSE contract
Every chunk is a data: line with JSON. Stream ends with data: [DONE]. Your parser must handle partial chunks split across TCP frames, empty keepalive lines, and the termination signal. Never assume one data: line equals one complete JSON object.
class SSEParser {
buffer = ""; decoder = new TextDecoder();
async *parse(response: Response): AsyncGenerator<SSEEvent> {
const reader = response.body!.getReader();
try {
while (true) {
const { value, done } = await reader.read();
if (done) break;
this.buffer += this.decoder.decode(value, { stream: true });
const lines = this.buffer.split("\n");
this.buffer = lines.pop() || "";
for (const line of lines) {
if (!line) continue;
if (line === "data: [DONE]") return;
if (line.startsWith("data: ")) {
yield { type: "chunk", data: JSON.parse(line.slice(6)) };
}
}
}
} finally { reader.releaseLock(); }
}
}Stop button with AbortController
Every streaming UI needs a Stop button. Without it, users pay for unwanted tokens. Use AbortController — create before fetch, pass signal, call abort() on Stop. This closes the TCP connection and stops token generation. Also handle: connection never established (show error + retry), mid-stream drop (show partial response + error), server error after chunks (show what you have, mark incomplete).
Updated:
Ready to ship your AI product?
Start free, route across providers, and see honest cost + readiness from day one.
Related reading
- Product
VeloxAI: the multi-model control plane for product teams
Why product teams need one API for models, agents, RAG, billing, analytics, and readiness instead of another thin provider proxy.
- Models
How to choose the right AI model for every product workflow
A battle-tested model selection framework covering cost, latency, context window, tool calling, vision, and reasoning — with real numbers and a decision matrix.
- Knowledge Base
Building a production RAG system that doesn't lie to users
A production-grade RAG pipeline needs ingestion state, chunk metadata, vector isolation, citations, queue-based indexing, and honest failure modes.