WebSockets — A Working Reference

What WebSockets actually are

A WebSocket is a TCP connection that starts as an HTTP request and gets upgraded — via the HTTP Upgrade mechanism — into a persistent, bidirectional message stream. The protocol is defined in RFC 6455. Both sides can send messages at any time, in either text or binary frames, until one side closes the connection.

That bidirectional, low-latency property is the whole point. HTTP request/response is great when the client knows when to ask. WebSockets are for cases where the server has something to push — a chat message, a price update, a game state change — and the client should know immediately rather than five seconds later when it next polls.

The handshake

A WebSocket connection starts with an HTTP GET request that includes specific headers:

GET /chat HTTP/1.1
Host: example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13

The server responds with HTTP 101 Switching Protocols and the handshake completes. From that point, the same TCP connection carries WebSocket frames in both directions.

This matters for two reasons. First, the connection is opened over standard ports 80/443, so it traverses corporate proxies and firewalls that would block other protocols. Second, the initial request is a regular HTTP request, which means you can authenticate using the same mechanisms as the rest of your API — typically a token in the Authorization header or in a query parameter (more on which to use below).

Authentication on the upgrade

Three approaches, in decreasing order of preference:

Token in the Authorization header. The cleanest approach for non-browser clients. The handshake is a normal HTTP request, so it carries headers. The downside: browser WebSocket APIs don't let JavaScript set arbitrary headers.
Token in the URL query string. The browser-friendly fallback: wss://example.com/chat?token=.... Works everywhere but the token ends up in server access logs and proxy logs. Use a short-lived token (issued by your normal HTTP API for one-time use) rather than a long-lived API key.
Authenticate after the connection opens. Accept the connection, then require the first message to be an auth message; close the connection if it doesn't authenticate within a few seconds. Works around browser header limitations but leaves connections open briefly with unknown identity, which complicates rate-limiting and abuse protection.

Whichever you choose, validate the token at handshake time. If it's invalid, close with a 401 (or send a close frame with code 4401, since the formal RFC 6455 close codes don't include an "unauthorized" value). Don't accept connections you'd reject — every accepted connection consumes a file descriptor.

Message format

WebSocket frames are either text (UTF-8) or binary. The protocol has no notion of "messages" beyond that — no JSON, no envelopes, no message types. You design that on top.

The convention that has settled out for application protocols over WebSocket: each message is a JSON object with a type field that the receiver dispatches on, and a payload field with the type-specific data:

{ "type": "chat.message", "payload": { "room": "general", "text": "hello" } }
{ "type": "chat.typing", "payload": { "room": "general", "user": "alice" } }
{ "type": "presence.join", "payload": { "user": "bob" } }

Add a message ID for messages that need acknowledgement, and an event timestamp if order matters. Resist the urge to make the format too clever — JSON-RPC and similar standards add structure that's rarely worth its cost on a single application's WebSocket protocol.

Heartbeats and dead connections

The single most underrated WebSocket fact: a TCP connection can be silently dead for minutes before either side notices. A laptop goes to sleep, a NAT entry expires, a load balancer drops state — none of these necessarily produce a TCP RST that the application sees. Without explicit heartbeats, your server thinks the connection is fine while the client gave up ten minutes ago.

The fix is application-level pings. Both sides should send a ping frame (or a custom ping message) every 20–30 seconds and close the connection if they don't receive a pong within a few seconds. RFC 6455 has PING and PONG control frames built in for exactly this — use them. The convention:

Server sends a ping every 30 seconds.
Client responds with pong (most WebSocket libraries do this automatically).
If the server doesn't receive a pong within 10 seconds, it closes the connection.
The client also sends pings if it hasn't sent or received a frame in 30 seconds, for the same reason from its side.

Without this, your connection-count metric is meaningless — half the "connected" sessions are zombies.

Reconnection

Connections will drop. Networks change, servers restart, mobile clients switch from Wi-Fi to cellular. The client must handle reconnection gracefully or the user experience falls apart at the first hiccup.

The pattern:

Exponential backoff with jitter. First reconnect after 1 second, then 2, 4, 8, up to a cap of about a minute. Random jitter prevents thundering herds when a server restart drops thousands of clients at once.
Clear backoff on successful connection. The next reconnect attempt should start at 1 second again, not at the previous high-water mark.
Resume vs reconnect. If your protocol can survive losing the connection (idempotent messages, server-side state that persists), reconnecting is enough. If the client needs to resume an in-progress session, you need a session ID and server-side state that survives the disconnect for some grace period.
Catch-up on reconnect. If the client missed messages while disconnected, the server needs to send them on reconnect. The standard pattern is for each message to carry a sequence number; on reconnect, the client tells the server the last sequence it saw and the server sends everything since.

Backpressure

What happens when the server wants to send messages faster than the client can consume them? Without explicit handling, messages pile up in the server's send buffer and eventually the operating system drops the connection. Worse, messages can sit in the buffer for seconds, so by the time the client receives them they're stale.

Two patterns:

Bounded per-client queues. Cap the number of in-flight messages per client. When the queue fills, drop the oldest (for tickers) or close the connection (for chat where missing messages is unacceptable). The choice depends on the application.
Server-side rate limiting per client. Some application protocols are inherently bursty — a price feed sending an update on every market tick. Rate-limit at the application level (one message per 100ms per symbol) before the network ever sees the burst. See API rate limiting strategies for the algorithms.

Scaling

WebSockets are stateful — the connection is the state. This breaks the normal "stateless servers behind a load balancer" pattern in two ways:

Each server only sees the clients connected to it. A "broadcast to room X" operation has to fan out to every server that holds a client in that room. The standard solution is a pub/sub backbone (Redis pub/sub, NATS, Kafka) that all WebSocket servers subscribe to.
Connection count, not request rate, is the limit. A WebSocket server can hold tens or hundreds of thousands of idle connections, but each one consumes a file descriptor and some memory. The right metric to monitor is open connections per server, not requests per second.

The architectural decision worth making early: do all your WebSocket servers serve all clients, or do you shard clients to specific servers? Sharding (by user ID, by room ID) reduces fan-out cost but introduces a routing problem. Fully connected servers are simpler but pay the fan-out cost on every event.

WebSockets vs alternatives

WebSockets vs Server-Sent Events (SSE)

SSE is one-directional (server-to-client only) over a regular HTTP response that never closes. It's much simpler than WebSockets — no upgrade, no framing, no PING/PONG — and works through any HTTP infrastructure. Use SSE when you only need server-push and the client occasionally sends commands via separate HTTP requests. Use WebSockets when the client sends data frequently enough that opening a separate HTTP connection per message is wasteful.

WebSockets vs polling

If updates are rare or batched (every minute or longer), polling is fine and saves you all of the operational complexity. If updates are frequent (every few seconds) or latency matters (chat, collaboration), WebSockets pay back the complexity. The middle ground — long-polling — is mostly a relic; if you can run WebSockets, run them.

WebSockets vs webhooks

Webhooks are server-to-server push: your server makes an outbound HTTP request to someone else's URL when something happens. WebSockets are client-to-server: a long-lived connection that the client initiates and maintains. Use webhooks for server integrations where the receiver has a public URL. Use WebSockets for browser or mobile clients that don't have an inbound endpoint. The two solve different problems and often coexist. For webhook design — signing, retries, replay protection — see Webhook Design and Delivery.

Common mistakes

No heartbeats. The connection-count metric becomes meaningless; clients see hung sessions; servers leak file descriptors.
Authenticating only at handshake, never re-checking. A connection that was authorized an hour ago might no longer have the right permissions. Validate again on operations that matter.
Treating the connection as reliable in-order delivery within an application session. WebSocket itself is in-order over a single connection — but reconnects break that. Sequence numbers and resume tokens are how you regain order across disconnects.
Sending huge messages. WebSockets are framed but most implementations buffer the whole message before delivering it; sending megabytes blocks the connection for everyone behind it. Chunk large payloads or send a reference to a separately downloadable resource.
Forgetting to close cleanly. Send a close frame with a meaningful code before tearing down. Abrupt closes look like network errors and trigger reconnect storms.
No per-client rate limiting. One bug in a client can flood the server with messages. Limit message rate per connection at the application layer.
Caching state in connection memory. The connection can drop at any moment. Anything that needs to survive a disconnect lives in a database or cache, not in the connection's local variables.

Where to go next

For request/response patterns, see REST API Design. For schema-driven APIs that include subscription support, see GraphQL. For server-to-server push, see Webhook Design and Delivery. For the authentication patterns the upgrade handshake should use, see the authentication reference.