WebSockets: Real-Time Bidirectional Communication

Systems Design
| # | Post | What it covers |
|---|---|---|
| 00 | APIs & Communication: How Services Talk to Each Other | How services talk to each other shapes everything about a system. Nine concepts covering REST, WebSockets, async patterns, and API gateways. (146 chars) |
| 01 | API Design: Building Contracts That Last | A great API is a contract that outlasts your code. Here are the principles that make APIs intuitive to consume, safe to evolve, and cheap to maintain. (154 chars) |
| 02 | REST APIs: Constraints That Create Benefits | REST isn't just HTTP with JSON. It's an architectural style with specific constraints — and understanding them explains why REST APIs are designed the way they are. (166 chars) |
| 03 | Authentication vs Authorisation: Two Questions, Two Checks | Authentication is who you are. Authorisation is what you're allowed to do. Confusing them is one of the most common security mistakes in system design. (153 chars) |
| 04 | Session vs Token Authentication: Stateful vs Stateless Identity | Session auth stores identity on the server. Token auth encodes it in the token. Here's how each works, where each breaks, and how to choose. (144 chars) |
| 05 | OAuth 2.0 & OpenID Connect: Delegated Access and Federated Identity | OAuth 2.0 lets users grant apps access without sharing passwords. OpenID Connect adds identity on top. Here's how both actually work. (137 chars) |
| 06 | JWT: What's Actually Inside the Token | JWTs are everywhere in modern auth — and frequently misused. Here's exactly what a JWT contains, how the signature works, and what it doesn't protect. (153 chars) |
| 07 | WebSockets: Real-Time Bidirectional Communication ← you are here | HTTP is request-response. WebSockets are a persistent two-way channel. Here's how they work, when to use them, and what to watch out for at scale. (151 chars) |
| 08 | Long Polling, SSE & Webhooks: The Server-Push Spectrum | Three patterns for server-push communication — long polling, server-sent events, and webhooks. Here's how each works and when to reach for each. (150 chars) |
| 09 | Sync vs Async Communication: The Architectural Fork | Synchronous services couple tightly. Asynchronous services decouple — but add complexity. Here's how to reason about which your system needs. (147 chars) |
| 10 | API Gateways: One Entry Point, Every Cross-Cutting Concern | An API gateway centralises auth, rate limiting, routing, and observability for all your services. Here's what it does, how it works, and when you need one. (158 chars) |
| 11 | APIs & Communication: Wrap-Up | A complete recap of all ten API and communication concepts — REST, auth, JWT, WebSockets, webhooks, async patterns, and API gateways — and how they connect. (161 chars) |
WebSockets: Real-Time Bidirectional Communication
The problem
Your URL shortener's admin dashboard needs to show live redirect counts — a counter that updates every second as users click links. With a standard REST API, the dashboard polls GET /links/{id}/clicks every second. At 100 admin users watching 10 links each, that's 1,000 requests per second to your analytics API — for data that changes maybe once every few seconds per link.
You reduce polling to every 5 seconds. Counts now lag by up to 5 seconds. You add aggressive caching. Counts now lag by cache TTL. You add cache invalidation on write. You've rebuilt push notifications using polling and cache invalidation — badly — when what you needed was a direct channel for the server to push updates to the dashboard as they happen.
HTTP's request-response model fundamentally doesn't fit this use case. The server needs to initiate communication, not respond to it. WebSockets are the solution — and understanding exactly how they work, and what they cost at scale, is what lets you use them where they genuinely help and avoid them where they're overkill.
The core idea
A WebSocket is a persistent, bidirectional communication channel between a client and a server over a single TCP connection. Unlike HTTP, which requires the client to initiate every exchange, a WebSocket connection allows either party to send messages at any time, in either direction, without the overhead of a new request for each message.
The connection starts as an HTTP request, upgrades to the WebSocket protocol, and remains open until explicitly closed by either party.
The analogy: walkie-talkie vs phone call
HTTP is a walkie-talkie. You press the button, say your piece, release. You wait for the other party to press their button and respond. The exchange is structured — one party transmits, the other listens, they swap. Starting a new exchange requires picking up the walkie-talkie and pressing the button again.
WebSockets are a phone call. Once the call is connected, both parties can speak at any moment. There's no button to press, no request to initiate. If something happens on one end, they tell the other immediately. The connection is maintained until someone hangs up.
The walkie-talkie (HTTP) is perfectly adequate for most structured interactions. The phone call (WebSockets) is essential when either party needs to initiate communication spontaneously and immediately.
How WebSockets work
The handshake
WebSocket connections start as HTTP requests, using the Upgrade mechanism to switch protocols:
Client → Server:
GET /ws HTTP/1.1
Host: api.sho.rt
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
Server → Client:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
After the 101 Switching Protocols response, the HTTP connection is repurposed as a WebSocket connection. The TCP connection stays open; the protocol running over it is now WebSocket, not HTTP.
Why start as HTTP? It allows WebSocket connections to pass through HTTP infrastructure — load balancers, proxies, firewalls — without requiring new port configurations. Port 80 and 443 remain the entry points.
The WebSocket protocol
The WebSocket protocol frames messages in a compact binary format. Each frame has a small header (2–10 bytes) and a payload. This is significantly more efficient than HTTP for high-frequency small messages — an HTTP request has 200–800 bytes of header overhead per message; a WebSocket message might have 2 bytes of overhead.
Messages can be:
- Text frames — UTF-8 encoded text (typically JSON)
- Binary frames — raw bytes for binary data, images, or custom protocols
- Ping/Pong frames — heartbeat mechanism to detect dead connections
- Close frames — signals graceful connection termination
Connection lifecycle
1. Client opens WebSocket connection (HTTP upgrade)
2. Server accepts — TCP connection becomes WebSocket channel
3. Either party sends messages at any time:
Server → Client: {"type": "click_update", "link": "x7Kp2", "count": 1247}
Client → Server: {"type": "subscribe", "links": ["x7Kp2", "a3bZ9"]}
4. Heartbeat pings detect dead connections (client or server)
5. Either party sends close frame — connection terminates
Authentication with WebSockets
WebSocket handshakes support standard HTTP headers — but only during the initial HTTP upgrade request. Once the connection upgrades to WebSocket, HTTP headers are no longer available.
Common patterns:
// Option 1: Token in upgrade request header (cleanest)
GET /ws HTTP/1.1
Authorization: Bearer eyJ...
// Option 2: Token as query parameter (visible in logs — avoid for sensitive tokens)
GET /ws?token=eyJ... HTTP/1.1
// Option 3: First message authentication
// Connect unauthenticated, send auth message immediately
Client → Server: {"type": "auth", "token": "eyJ..."}
Server validates and either accepts or closes the connection
Option 1 is preferred. Option 3 (authenticate via first message) is used when the WebSocket server can't access HTTP headers, but requires the server to hold the connection open briefly in an unauthenticated state.
WebSockets at scale
A single server can maintain many open TCP connections. But "many" has limits — each connection consumes file descriptors, memory for connection state, and CPU for heartbeat processing. A typical Node.js server can maintain 10,000–100,000 concurrent WebSocket connections depending on message frequency and hardware.
For a large-scale real-time system, this creates challenges:
Horizontal scaling and sticky sessions. A WebSocket connection is pinned to a specific server for its lifetime. When you add a second server, new connections might go to server B, but existing connections stay on server A. If server A crashes, those connections are lost and clients must reconnect. Load balancers need to support sticky sessions (routing a client's reconnect to the same server) or connection migration.
Fan-out. When server-side event occurs (a link gets 1,000 clicks), you might need to push a notification to 500 connected dashboard users. If those users are distributed across 10 servers, the event must fan out to all 10. A pub/sub layer (Redis Pub/Sub, Kafka) coordinates this:
Analytics event → Redis Pub/Sub channel "link:x7Kp2:clicks"
│
┌─────────────────┼─────────────────┐
▼ ▼ ▼
WS Server 1 WS Server 2 WS Server 3
(subscribers (subscribers (subscribers
on this server) on this server) on this server)
Each WebSocket server subscribes to Redis. When an event publishes, all servers receive it and push to their connected clients. The pub/sub layer bridges the gap between stateless event processing and stateful WebSocket connections.
Connection recovery. Network interruptions drop WebSocket connections. Clients should implement exponential backoff reconnection and resume from where they left off if the application requires ordered delivery. Many WebSocket frameworks include this; applications built directly on the WebSocket API need to implement it manually.
When WebSockets are the right choice
Use WebSockets when:
- Communication is genuinely bidirectional — both client and server need to initiate messages
- Low latency for server-initiated messages is required
- High-frequency small messages would create unacceptable HTTP overhead (gaming, live trading, collaborative editing)
- The connection must persist for an extended session with spontaneous message flow
Consider alternatives when:
- The server only needs to push data to the client (no client-to-server messages needed) → SSE is simpler
- Push events are infrequent (a few per minute) → long polling or SSE is adequate
- You need to push to clients that may be offline → webhooks or push notifications
- The interaction is fundamentally request-response with occasional server events → SSE + REST
In the URL shortener: the real-time dashboard showing live click counts is a good WebSocket use case — the server pushes updates as clicks happen, with low latency, to multiple connected admin users simultaneously. A simpler approach using SSE would also work here since the communication is one-directional (server pushes, client doesn't need to send messages back). Both are valid; WebSockets are slightly over-engineered for pure server-push but appropriate if the dashboard will later need bidirectional features.
The tradeoffs
Statefulness. WebSocket connections are inherently stateful — the server must track which clients are connected, what they're subscribed to, and manage connection lifecycle. This is fundamentally at odds with stateless horizontal scaling. Designing the system so that WebSocket connection state is as thin as possible (just connection identity and subscriptions) and externalising all real state (into Redis or a database) is the path to scalable WebSocket infrastructure.
Operational complexity. HTTP request/response is stateless and easy to reason about. WebSocket connections are long-lived, stateful, and distributed across servers. Debugging why a specific client isn't receiving messages requires knowing which server they're connected to, what their subscription state is, and whether the fan-out path is working. This is genuinely more complex to operate than REST endpoints.
Firewall and proxy compatibility. Most HTTP proxies and firewalls handle WebSockets correctly today, but WebSocket connections can be disrupted by aggressive timeout settings on intermediate proxies. Corporate firewalls sometimes block or timeout WebSocket connections. Heartbeat messages keep the connection alive through most proxies.
The one thing to remember
WebSockets are the right tool when both sides need to initiate communication and latency matters. They're the wrong tool when only the server needs to push — SSE is simpler for that. They're definitely the wrong tool if the communication is fundamentally request-response with occasional updates — REST plus SSE handles that. The cost of WebSockets is statefulness and operational complexity; pay that cost only when the use case genuinely requires bidirectional, persistent, low-latency communication.
← Previous: JWT — OAuth issues JWTs as access tokens and OIDC issues JWTs as ID tokens. The next post goes inside the token format — what a JWT actually contains, how the signature works, and the ways it's commonly misused.
→ Next: Long Polling, SSE & Webhooks — WebSockets enable bidirectional real-time communication; the next post covers the spectrum of server-push patterns for cases where full bidirectional communication isn't needed.




