~/articles/realtime-communication-patterns

◆◆Intermediateasked at Slackasked at Discordasked at Stripe

Real-Time Communication Patterns

Short polling, long polling, Server-Sent Events, WebSockets, WebTransport, and HTTP/2 push — when each wins, how each fails, and how to pick for your use case.

16 min read2026-02-28Ironclad Academy

#networking #real-time #websockets #sse #apis #quic

// DEPTH

the full breakdown — requirements, capacity, evolution, trade-offs

"How does the server tell the client about a new event?" is a question that's been answered at least six different ways over twenty years. Each answer is right for some workload and wrong for others. This article walks the spectrum — short polling, long polling, SSE, WebSockets, WebTransport — and tells you which to pick when.

The fundamental problem

HTTP was designed for request-response. The client asks; the server answers. To push events from server to client, you have to bend HTTP in some direction.

flowchart LR
    C[Client] -->|request| S[Server]
    S -->|response| C
    style C fill:#0e7490,color:#fff
    style S fill:#ff6b1a,color:#0a0a0f

The patterns in this article are different ways of bending that arrow backward. It helps to see them laid out by how much they each bend it — from "client just keeps asking" all the way to "the two endpoints share a fully bidirectional pipe":

flowchart LR
    SP["Short polling<br/>Client asks repeatedly"] --> LP["Long polling<br/>Client asks, server waits"]
    LP --> SSE["SSE<br/>Server streams forever"]
    SSE --> WS["WebSocket<br/>Full duplex"]
    WS --> WT["WebTransport<br/>Multiplexed + datagrams"]
    style SP fill:#ffaa00,color:#0a0a0f
    style LP fill:#0e7490,color:#fff
    style SSE fill:#15803d,color:#fff
    style WS fill:#ff6b1a,color:#0a0a0f
    style WT fill:#a855f7,color:#fff

The further right you go, the lower the latency and the higher the operational cost. The rest of this article explains what each step means in practice.

Comparison at a glance

Pattern	Direction	Latency	Connections	Best for
Short polling	C → S	High (poll interval)	New per poll	Simple, infrequent updates
Long polling	C → S, S keeps it	Low	One held open	Legacy compat, "almost real-time"
SSE	S → C only	Low	One per client	Notifications, feeds, ticker
WebSocket	Bi-directional	Lowest	One per client	Chat, games, collaboration
WebTransport	Bi-directional	Lowest	Multiplexed	Future-facing, replaces WS
HTTP/2 push	S → C	Low	Tied to a request	Mostly defunct in browsers
Webhooks	S → S (server-to-server)	Async	Per event	Service-to-service callbacks

Short polling

sequenceDiagram
    actor C as Client
    participant S as Server
    loop every 5s
        C->>S: GET /messages?since=t
        S-->>C: [] (nothing yet)
    end
    Note over S: New message arrives
    C->>S: GET /messages?since=t
    S-->>C: [{msg}]

The client asks the server every N seconds. Cheap to build, wasteful at scale, and latency is bounded by the poll interval — if something happens 1 second after the last poll, the client won't know for up to N-1 more seconds.

What it has going for it: it's trivial to implement, works through every proxy and firewall, and keeps the server completely stateless. What it costs: at 1M users polling every 5 seconds, you're absorbing 200k QPS with mostly empty responses. That's not nothing.

Real use: status pages, low-frequency dashboards, jobs that finish in minutes. Anywhere you can tolerate a known latency ceiling and your users aren't waiting on the edge of their seat.

Long polling

sequenceDiagram
    actor C as Client
    participant S as Server
    C->>S: GET /messages?since=t
    Note over S: Hold open up to 30s
    Note over S: New message arrives at t+12s
    S-->>C: [{msg}]
    C->>S: GET /messages?since=t+12s
    Note over S: Hold open again...

The client sends a request, and the server holds it open — not responding until either an event arrives or a timeout (typically 30 seconds). The moment the server responds, the client immediately sends another request. From the outside it looks like a persistent stream; under the hood it's a series of held HTTP requests.

Near-instant delivery is the key win here. It works through any HTTP-friendly proxy and falls back gracefully to short polling if the timeout fires with no events. The cost is that each client holds one server connection open at all times, which consumes file descriptors and memory even when there's nothing to say — and every delivery incurs a full TCP/TLS handshake as the client reconnects — an overhead SSE and WebSocket amortize across the lifetime of the connection.

Facebook Messenger's web client ran on long polling before migrating to WebSockets — as late as 2012, Messenger for Firefox was described as "the first Facebook product to use WebSockets at scale," implying the broader web client was still on long polling at that point. It's also the traditional fallback for browsers that block WebSocket upgrades. Think of it as "almost real-time for free, with extra connection churn."

Server-Sent Events (SSE)

A standard HTTP response that just never ends. The server writes lines in the format data: {...}\n\n and the browser parses each one as an event. Nothing clever — it's just a response body that keeps streaming.

sequenceDiagram
    actor C as Client
    participant S as Server
    C->>S: GET /stream (Accept: text/event-stream)
    S-->>C: event: stream open
    Note over S: New event
    S-->>C: data: {"kind":"message","id":1}
    Note over S: Another event
    S-->>C: data: {"kind":"presence","id":2}

The browser's built-in EventSource API handles this natively — no library needed. If the connection drops, EventSource reconnects automatically and sends Last-Event-ID so the server knows where to resume. It survives most corporate proxies because it looks like a slow HTTP download. One server-to-client connection per client; no client-to-server traffic over it.

The one-way constraint is the whole story with SSE. The client uses regular HTTP requests for any upstream messages — SSE is strictly a server-to-client pipe. There's also a subtle browser limit worth knowing: HTTP/1.1 caps SSE connections at 6 per origin across all open tabs, not per tab. Open 7 tabs to the same origin and the 7th SSE stream blocks waiting for one of the first six to close. Serving the SSE endpoint over HTTP/2 eliminates this limit entirely, since HTTP/2 multiplexes everything over a single TCP connection — EventSource works on HTTP/2 transparently.

Real use: stock tickers, notification feeds, live activity streams. Stripe Dashboard uses SSE. GitHub Actions log streaming is SSE. Any place where the server has data and the client just needs to receive it.

WebSockets

A real bidirectional connection. The exchange starts as HTTP — the client sends an Upgrade: websocket header — and if the server accepts, both sides switch to a persistent TCP channel where either side can send frames at any time.

sequenceDiagram
    actor C as Client
    participant S as Server
    C->>S: HTTP Upgrade: websocket
    S-->>C: 101 Switching Protocols
    Note over C,S: Connection is now persistent
    C->>S: {"type":"typing"}
    S-->>C: {"type":"message", "id":42}
    S-->>C: {"type":"presence"}
    C->>S: {"type":"ack", "id":42}

Full duplex, low per-message overhead (no HTTP headers on every frame), and real-time in both directions. The tradeoffs are the flip side of those wins: the connection is stateful, which means every server holds N connections in memory. Load balancers must be configured to pass WebSocket traffic (most do now). Auth and reconnect logic is more complex than HTTP. Some corporate proxies still block the Upgrade header.

Real use: Slack, Discord, multiplayer games, collaborative editors like Figma and Google Docs, trading platforms.

Scaling WebSockets

The hard problem isn't the protocol — it's horizontal scale. When a user is connected to server A, but the event meant for them arrived at server B, how does B reach A?

flowchart LR
    U1[User 1] --> S1[Server A]
    U2[User 2] --> S2[Server B]
    PUB[Event published] --> RD[(Redis pub/sub)]
    RD --> S1
    RD --> S2
    style PUB fill:#ff6b1a,color:#0a0a0f
    style RD fill:#dc2626,color:#fff

The standard answer is a pub/sub backbone — Redis, Kafka, NATS. Each WebSocket server subscribes to channels for the users currently connected to it. When an event for user X is published anywhere, every server checks "do I hold user X's socket?" and if so, pushes. Discord and Slack run thousands of WebSocket gateways with Redis or proprietary backbones routing between them. The chat system design covers this in detail.

WebTransport (the future)

HTTP/3-based, multiplexed, runs over QUIC. Supports both reliable streams and unreliable datagrams (UDP-like).

The unreliable datagram support is the key thing WebSocket can't do. In a game, you want player position updates to arrive as fast as possible — if a packet is lost, you'd rather skip it and use the next one than wait for TCP to retransmit the old one. WebSocket is TCP-only, so you get head-of-line blocking: a single lost packet stalls the entire connection until it's retransmitted. WebTransport lets you send position updates as unreliable datagrams and chat messages as reliable streams on the same connection, independently. A lost datagram doesn't stall the chat stream.

Multiplexing is the other win: one QUIC connection carries many parallel streams, and a lost packet only stalls the stream it belongs to.

Browser support arrived in stages — Chrome since Chrome 97 (January 2022), Firefox since v114 (June 2023) — with Safari support expected in Safari 26.4 (March 2026), which will complete Baseline status across all major browsers. For new latency-sensitive or multiplexed workloads in the browser, WebTransport deserves a serious look. Most existing deployments will stay on WebSocket for ecosystem and operational familiarity.

HTTP/2 Push (mostly dead)

The server pushes resources alongside a response. Browsers shipped it, then removed it — Chrome in 2022 — because cache management was a nightmare. Skip unless you're targeting non-browser HTTP/2 clients.

Webhooks (the server-to-server pattern)

Webhooks are how two servers talk asynchronously — a fundamentally different use case from the patterns above, which are all about browser clients.

sequenceDiagram
    participant A as Service A
    participant B as Service B
    A->>B: register webhook URL
    Note over A: Event occurs
    A->>B: POST /your-webhook {event}
    B-->>A: 200 OK

GitHub posts webhooks when a PR opens. Stripe posts when a payment succeeds. The receiver is a plain HTTP endpoint. Simple to debug, works at any scale, no persistent connections needed.

The catches: the receiver might be down when the event fires, so the sender needs a retry strategy. Parallel HTTP POSTs can arrive out of order. And receivers behind NAT need polling or a relay instead.

Webhooks look trivial until they fail at scale. Production hardening has a few essential pieces. Delivery is at-least-once by design — senders retry until they get a 2xx — so receivers must deduplicate by event ID to stay idempotent. Signature verification protects receivers from spoofed payloads: compute HMAC-SHA256(secret, body) on both sides, compare in constant time. Stripe sends it in Stripe-Signature; GitHub in X-Hub-Signature-256. For retry strategy, exponential backoff with a cap (72 hours is common) before moving undeliverable events to a dead-letter queue — you want to alert on DLQ growth before it becomes a silent backlog.

One important receiver rule: return 200 immediately and process asynchronously. A slow receiver that blocks for 30 seconds will trigger the sender's retry timer, causing duplicate processing even when the receiver eventually succeeds.

flowchart TD
    EVENT[Event fires] --> SEND[Sender POST to receiver]
    SEND -->|"2xx response"| DONE[Delivered]
    SEND -->|"timeout or 5xx"| RETRY{Retry budget<br/>remaining?}
    RETRY -->|Yes| BACKOFF[Exponential backoff<br/>wait, then retry]
    BACKOFF --> SEND
    RETRY -->|No| DLQ[Dead-letter queue]
    DLQ --> ALERT[Alert on DLQ growth]
    style DONE fill:#15803d,color:#fff
    style DLQ fill:#ff2e88,color:#fff
    style ALERT fill:#ff6b1a,color:#0a0a0f
    style RETRY fill:#ffaa00,color:#0a0a0f

When you need to deliver the same event to N subscribers, put the event on an internal queue (Kafka, SQS) and let a fan-out worker POST to each endpoint concurrently. This decouples your hot path from slow receivers. For internal service-to-service eventing at high volume, Kafka itself is usually the better answer than webhooks.

How to pick

flowchart TD
    Q1{Do clients need to<br/>send too?}
    Q1 -->|Yes, often| WS[WebSocket]
    Q1 -->|Mostly receive| Q2{Browser?}

    Q2 -->|Yes| SSE[Server-Sent Events]
    Q2 -->|No, server-to-server| WH[Webhooks]

    WS --> Q3{Game / unreliable<br/>datagrams useful?}
    Q3 -->|Yes| WT[WebTransport]
    Q3 -->|No| Q4{WebSocket upgrade<br/>blocked by proxy?}
    Q4 -->|No| WS2[Stay on WebSocket]
    Q4 -->|Yes| LP[Long polling]

    Q1 -->|Updates are rare<br/>and latency-tolerant| SP[Short polling]

    style WS fill:#ff6b1a,color:#0a0a0f
    style SSE fill:#15803d,color:#fff
    style WH fill:#0e7490,color:#fff
    style SP fill:#ffaa00,color:#0a0a0f
    style WT fill:#a855f7,color:#fff
    style LP fill:#0e7490,color:#fff

The quick mental model: if you need both sides sending frequently, WebSocket. If the data flows mostly server-to-browser, SSE is simpler and cheaper to operate. Server-to-server async callbacks are webhooks. If you only need updates occasionally and latency tolerance exists, short polling is genuinely fine — there's no shame in picking the boring option when the boring option fits. Long polling is the fallback for when proxies or corporate firewalls block the upgrade.

What goes wrong at scale

Connection count

WebSocket and SSE both hold one connection per client. 1M concurrent users means 1M sockets — each consuming file descriptors, memory, and kernel TCP state. Linux's default soft limit for file descriptors (ulimit -n) is typically 1,024; you raise it toward ~1M via /etc/security/limits.conf or systemd's LimitNOFILE. The kernel maximum is fs.nr_open (default ~1M). The "65k" limit people often cite is the ephemeral port range for outbound connections — not relevant to a server accepting inbound WebSocket connections.

On a tuned box (raised fs.file-max, net.core.somaxconn, socket buffer sizes), a single process can comfortably hold hundreds of thousands of long-lived connections. Benchmarks show 100k–300k per process as a practical ceiling once application overhead is factored in, with idle-only tests reaching 500k+. Multiply across processes and boxes and you reach millions per server.

Load balancer behavior

Some load balancers reset long-held connections after a fixed idle timeout. Configure idle timeouts and keep-alives to match your connection lifecycle. AWS NLB and ALB both support WebSockets; ALB needs sticky routing for connection affinity if your design requires it.

Reconnects and thundering herd

Networks drop connections. Clients reconnect — but if a server or partition recovers and thousands of clients all reconnect simultaneously, the server faces a connection storm that can knock it over just as it's coming back up.

sequenceDiagram
    participant C1 as Client 1
    participant C2 as Client 2
    participant CN as Client N
    participant S as Server
    Note over S: Server restart / partition recovery
    C1->>S: reconnect immediately
    C2->>S: reconnect immediately
    CN->>S: reconnect immediately
    Note over S: Overloaded — drops connections
    Note over C1,CN: Exponential backoff + jitter
    C1->>S: reconnect after 1.2s
    C2->>S: reconnect after 2.7s
    CN->>S: reconnect after 4.1s
    Note over S: Load spreads out — recovers cleanly

The fix is exponential backoff with jitter on the client side, combined with a rate limiter on new connections on the server side. Without jitter, every client using 2^attempt * base_delay will hit the same retry window at the same time — jitter breaks the synchrony. The server needs a way to resume from a known event ID, idempotent event delivery (or client-side deduplication), and backpressure on new connections to prevent the herd.

Memory

Each socket holds buffers — TCP send/recv buffers plus app-level state. Bad app-level buffering leads to OOM under bursty traffic. Use bounded queues per connection so a slow client can't cause the server to buffer unbounded data.

Auth & rotation

A long-lived connection won't naturally pick up new auth tokens. There are four main approaches, and the tradeoffs are real.

The lightest option is ticket-based auth at connect time: issue a short-lived, single-use token via a normal REST endpoint, and the client passes it in the WebSocket URL or first message. The server validates once and then the connection lives independent of the original OAuth token. Slack does this — apps.connections.open issues a ticket embedded in the WSS URL. Discord's Gateway goes the other direction — the client sends a long-lived token in an OP 2 Identify payload after the connection opens.

The second option is in-connection refresh: an application-level message to exchange new credentials mid-session. It keeps the connection alive through token rotation but requires both client and server to implement a refresh protocol — more moving parts.

The simplest option is accepting that the token lifetime equals the connection lifetime. This works when connections rarely exceed an hour and you don't need to revoke users mid-session. It breaks when you need to immediately cut off a suspended account, since the connection stays open until the token expires naturally.

For high-security contexts where you do need instant revocation: maintain a revoked-token set in Redis and check it on every message or on a periodic heartbeat. This is expensive at scale and is reserved for cases where the security requirement genuinely demands it.

Specific tech

Tech	What it gives
Socket.IO	WebSocket with polyfill fallbacks (long poll). Heavy but compatible.
SignalR	.NET equivalent.
Phoenix Channels (Elixir)	WS + presence + cluster routing built in.
Centrifugo	Open-source pub/sub for WS/SSE clients.
AWS API Gateway WebSocket	Managed, expensive at scale.
Pusher / Ably / Pubnub	Hosted real-time. Fast to ship; pricey.
Cloudflare Durable Objects	Stateful workers — natural fit for WS connections.

Worked example: a notifications feed

Requirements: browser clients receive notifications when something happens, mostly server-to-client, ~500k concurrent users, p99 latency under 500ms from event to client.

Pick: SSE. The workload is one-way, which is exactly what SSE was designed for. Built-in browser support means no library. It's HTTP-shaped and works through everything. Built-in reconnect via Last-Event-ID handles replay. And since there's no client-to-server traffic, it's cheaper than WebSocket — fewer moving parts to operate.

flowchart LR
    PUB[Producer service] --> KAFKA[Kafka topic:<br/>user.notifications]
    KAFKA --> SSE[SSE servers]
    SSE -->|stream| U1[User 1 browser]
    SSE -->|stream| U2[User 2 browser]
    SSE -->|stream| U3[User N browser]
    style SSE fill:#15803d,color:#fff

Each SSE server subscribes to Kafka and routes events to whichever connected users it's serving, sharded by user_id.

Worked example: a chat application

Requirements: bidirectional (typing indicators, send, receive), low latency, presence, typing state, read receipts.

Pick: WebSocket. You need bidirectional and low latency. SSE can't send client messages over the same channel — you'd end up with SSE for server-to-client and HTTP for client-to-server, which is essentially half a WebSocket with extra complexity. When both sides need to talk, use the thing designed for that. Architecture is in the chat system design.

Things you should now be able to answer

Why is long polling a transitional pattern between polling and WebSockets?
When does SSE beat WebSocket?
What's the hard scaling problem with WebSockets and how is it usually solved?
What does WebTransport buy you over WebSocket?
For server-to-server async, when do you pick webhooks vs Kafka?

Frequently asked questions

▸What is the difference between SSE and WebSockets, and when should you choose SSE?

SSE is a one-way server-to-client stream over a plain HTTP response; WebSocket is a full-duplex TCP channel where either side can send frames at any time. Choose SSE when the data flows mostly server-to-browser — notifications, feeds, live logs — because it is simpler to operate, requires no library (the browser EventSource API handles reconnect and Last-Event-ID replay natively), and works through corporate proxies that block WebSocket upgrades.

▸What is the HTTP/1.1 browser connection limit for SSE, and how do you work around it?

HTTP/1.1 caps SSE connections at 6 per origin across all open tabs, not per tab — opening a seventh tab to the same origin blocks its SSE stream until one of the first six closes. Serving the SSE endpoint over HTTP/2 eliminates this limit entirely, because HTTP/2 multiplexes streams over a single TCP connection and EventSource works on HTTP/2 transparently.

▸What is the hard scaling problem with WebSockets and how is it solved?

When a user is connected to server A but the event destined for them arrives at server B, server B has no direct path to that user's socket. The standard solution is a pub/sub backbone — Redis, Kafka, or NATS — where each WebSocket server subscribes to channels for the users it is currently holding; when an event is published anywhere, every server checks whether it owns that user's socket and pushes if so. Discord and Slack both route across thousands of WebSocket gateway servers this way.

▸What does WebTransport add over WebSocket, and which browsers support it?

WebTransport runs over QUIC (HTTP/3) and supports unreliable datagrams in addition to reliable streams, meaning a lost packet only stalls the stream it belongs to rather than blocking the entire connection as TCP head-of-line blocking does with WebSocket. Browser support spans Chrome since version 97 (January 2022) and Firefox since v114 (June 2023); Safari support is expected in Safari 26.4 (March 2026), which will complete Baseline status across all major browsers.

▸When should you use long polling instead of SSE or WebSockets?

Long polling is the right fallback when corporate firewalls or proxies block the WebSocket Upgrade header and you need near-instant delivery. It works through any HTTP-friendly proxy and naturally degrades to short polling if the 30-second hold timeout fires with no event. The cost is per-delivery connection churn: the client reconnects immediately after every response, incurring a full TCP/TLS handshake per event — an overhead that a persistent SSE or WebSocket channel amortizes across all events on the connection.

← previous

Database Indexing

Leader Election and Consensus (Raft, Paxos)

// RELATED