MODULE 03 / 12crash course
~/roadmap/03-networking
Beginner

Networking and HTTP

TCP vs UDP, DNS, TLS, HTTP/1.1 vs HTTP/2 vs HTTP/3, anycast routing, and what actually happens between a browser and a server.

17 min read2026-01-17Ironclad Academy

Almost every system design problem reduces to "move bytes between machines, fast and reliably." This module covers what actually happens at each layer — enough to design correctly, not enough to write a kernel driver.

The next module (APIs and Protocols) covers what you put into those bytes (REST, gRPC, GraphQL, WebSockets). This module covers the layers underneath that everything builds on.

The OSI model (or, why we say "Layer 7")

You'll hear engineers say "it's a Layer 7 problem" or "we balance at Layer 4." Here's what those numbers mean:

flowchart BT
    L1[Layer 1: Physical<br/>cables, radio, fiber] --> L2
    L2[Layer 2: Data Link<br/>Ethernet, Wi-Fi, MAC] --> L3
    L3[Layer 3: Network<br/>IP, routing, BGP] --> L4
    L4[Layer 4: Transport<br/>TCP, UDP, QUIC] --> L5
    L5[Layer 5-6: Session/Presentation<br/>TLS, encoding] --> L7
    L7[Layer 7: Application<br/>HTTP, gRPC, DNS]
    style L4 fill:#ff6b1a,color:#0a0a0f
    style L7 fill:#15803d,color:#fff

In practice, most of your work happens at Layer 4 (TCP/UDP) and Layer 7 (HTTP). Everything else, you can mostly ignore — until you can't (BGP outages, MTU mismatches, MAC flooding) and then you'll wish you'd read this section.

IP addresses (the briefest tour)

Every machine on the internet has at least one IP address. Two flavors:

  • IPv4: 32 bits, written as 192.0.2.42. About 4.3 billion possible addresses, all allocated.
  • IPv6: 128 bits, written as 2001:db8::1. Effectively infinite.

Most servers today are dual-stack (both). Most home networks are still IPv4 with NAT (Network Address Translation) — your router has one public IP and rewrites the source IP of outgoing packets so all your devices share it.

A few network ranges that are private (won't appear on the public internet):

10.0.0.0/8         — the big private range
172.16.0.0/12      — the awkward middle one
192.168.0.0/16     — what your home router uses
127.0.0.0/8loopback ('localhost')

When designing infrastructure, services usually live on private IPs, fronted by load balancers on public IPs.

TCP vs UDP

FeatureTCPUDP
Connection-orientedYes (3-way handshake)No
Reliable deliveryYes (retransmits)No
Ordered deliveryYesNo
Congestion controlYesNo
Header overhead20–60 bytes8 bytes
Use casesHTTP, SSH, databasesDNS, video, gaming, QUIC

TCP three-way handshake

This is the cost you pay every time you open a new TCP connection:

sequenceDiagram
    participant C as Client
    participant S as Server
    C->>S: SYN
    S->>C: SYN-ACK
    C->>S: ACK
    Note over C,S: connection established<br/>~1 RTT spent before any data flows
    C->>S: Data
    S->>C: Data

A round-trip across a continent is ~80ms. Across an ocean, ~150ms. So opening a fresh TCP connection costs you 80–150ms before you've sent a single byte. Then TLS adds another 1–2 RTTs. This is why HTTP keep-alive, connection pooling, and HTTP/2/3 exist.

TCP slow start (and why your first request is slow)

When a TCP connection opens, it doesn't immediately transmit at the line rate. It starts conservative — typically with a "congestion window" of ~10 packets — and doubles the window each RTT until it either hits the slow-start threshold (ssthresh) and switches to linear growth, or until it sees an actual loss. This is slow start.

flowchart LR
    A[cwnd: 10 packets] -->|RTT 1| B[20]
    B -->|RTT 2| C[40]
    C -->|RTT 3| D[80]
    D --> E[...until loss]
    E --> F[Halve, then linear growth]
    style A fill:#ff6b1a,color:#0a0a0f
    style F fill:#15803d,color:#fff

Practical implication: tiny responses are dominated by handshake, large transfers by slow start ramp-up. Connection reuse (keep-alive) skips both.

TCP backpressure and Nagle

TCP has a send buffer and a receive window. The receiver advertises how much it can buffer; the sender doesn't transmit more until the data is acknowledged. That's built-in backpressure — the receiver controls the pace, not the sender.

Nagle's algorithm coalesces small writes into larger segments to avoid wasting bandwidth on headers. The problem is that it waits up to 200ms for more data before sending, and it interacts badly with delayed ACKs (where the receiver itself waits a few milliseconds before acknowledging). When the two interact, you get 200ms hiccups on small request/response patterns. Real systems that care about latency disable Nagle with TCP_NODELAY.

sequenceDiagram
    participant App
    participant Sender as TCP Sender
    participant Receiver as TCP Receiver
    App->>Sender: write(small packet)
    Note over Sender: Nagle ON: wait for ACK or more data
    Receiver->>Sender: delayed ACK (~200ms)
    Sender->>Receiver: packet finally sent
    Note over App,Receiver: 200ms wasted — set TCP_NODELAY to disable

Why UDP for video

If a TCP packet is dropped, TCP retransmits it — which means every in-flight packet on that connection stalls until the missing one is recovered. That's head-of-line blocking, and for video or audio it means a visible freeze rather than a tiny glitch. UDP doesn't retransmit; the application decides what to do with gaps, which is usually "ignore it and keep playing." Modern WebRTC and QUIC are both built on UDP for this reason.

DNS: how a name becomes an IP

When your browser hits twitter.com, it doesn't know where Twitter lives. DNS resolves it.

sequenceDiagram
    participant B as Browser
    participant R as Recursive Resolver
    participant Root as Root NS
    participant TLD as .com TLD NS
    participant A as twitter.com NS
    B->>R: where is twitter.com?
    R->>Root: ?
    Root->>R: ask .com TLD
    R->>TLD: where is twitter.com?
    TLD->>R: ask twitter's auth NS
    R->>A: where is twitter.com?
    A->>R: 104.244.42.1
    R->>B: 104.244.42.1
    Note over B,R: cached for TTL seconds<br/>typical: 300s - 86400s

DNS records you'll meet

RecordPurposeExample
AHostname → IPv4api.example.com192.0.2.42
AAAAHostname → IPv6api.example.com2001:db8::1
CNAMEAliaswwwapex.example.com
MXMail serverexample.commail.example.com
TXTArbitrary textSPF, DKIM, domain verification
NSName servers for a zoneexample.comns1.dns-provider.com
CAAWhich CAs may issue certs0 issue "letsencrypt.org"

Implications for system design

DNS lookups cost an RTT on cache miss — if you call getaddrinfo("api.x.com") per request, you pay that price per request. The TTL is a dial between two competing needs: a short TTL (say, 60 seconds) lets you shift traffic away from a failing region in under a minute, but every machine in the world is hitting your nameservers constantly. A long TTL (86400 seconds, one day) means fewer lookups but much slower recovery if you need to reroute.

DNS round-robin is the cheapest, dumbest load balancer there is: return multiple A records, clients pick one semi-randomly. It doesn't account for health, weight, or server load, but it's zero-infrastructure and good enough for internal services.

Anycast DNS takes this further — multiple machines around the world all announce the same IP prefix, and BGP routes each resolver to the topologically closest copy. This is how Cloudflare's 1.1.1.1 achieves single-digit millisecond response times in well-connected regions and averages around 11ms globally. GeoDNS is a softer version: the nameserver returns different A records based on where the resolver is located, steering users to the nearest region.

DNS over HTTPS / DNS over TLS

Plain DNS is unencrypted — your ISP can see (and modify) your queries. DoH and DoT encrypt the DNS query. Both are now standard in browsers and most resolvers.

TLS: the cost of HTTPS

TLS encrypts and authenticates the channel. The handshake establishes a shared secret using public-key crypto, then everything afterwards uses faster symmetric crypto.

sequenceDiagram
    participant C as Client
    participant S as Server
    Note over C,S: TCP handshake (1 RTT)
    C->>S: ClientHello (cipher suites, SNI, key share)
    S->>C: ServerHello + cert + key share + Finished
    C->>S: Finished + GET / (same flight in TLS 1.3)
    Note over C,S: TLS 1.3: 1 RTT, total 2 RTT before app data
    S->>C: 200 OK

TLS versions, in 30 seconds

TLS 1.0 and 1.1 are both deprecated and broken — reject anything that negotiates them. TLS 1.2 is still common; a fresh handshake costs 2 RTTs. TLS 1.3 cuts that to 1 RTT, and if the client has connected before, session resumption lets it send application data immediately (0-RTT) with a few caveats around replay safety.

Server Name Indication (SNI)

Multiple HTTPS sites can share one IP because the client announces which hostname it wants in the SNI field of the ClientHello — that tells the server which certificate to present. The catch is that SNI is unencrypted in most TLS 1.2 and 1.3 deployments, so middleboxes can see which site you're connecting to even if they can't read the content. ECH (Encrypted Client Hello) fixes this and is rolling out gradually.

Certificates

A TLS certificate binds a public key to a hostname, signed by a Certificate Authority (CA) that's trusted because its own certificate is in your OS or browser's root store. Domain Validation (DV) certs — the kind Let's Encrypt issues — prove you control the domain and are fully automatic. Organization Validation and Extended Validation certs add identity checks but are increasingly meaningless to end users. Wildcard certs (*.example.com) cover all subdomains; SAN certs cover an explicit list of names.

Don't roll your own CA unless you have a good reason. Use Let's Encrypt for public services, and a managed PKI (AWS PCA, HashiCorp Vault, cert-manager) for internal mTLS.

TLS termination

Where you decrypt matters:

flowchart LR
    A[User] -->|HTTPS| EDGE[Edge / CDN]
    EDGE -->|HTTP or<br/>internal mTLS| ORIGIN[Origin]
    style EDGE fill:#ff6b1a,color:#0a0a0f
    style ORIGIN fill:#0e7490,color:#fff

Terminating TLS at the edge means the user does the expensive handshake with a server 5ms away rather than an origin 150ms away. The edge can also cache responses — impossible with end-to-end TLS where the cache can't read the payload. Cert renewal is centralized rather than spread across every origin box.

Inside your private network, mTLS (mutual TLS, both sides present certificates) is increasingly standard. It gives you encrypted, authenticated service-to-service calls without bolting custom auth logic onto each protocol.

HTTP: the web's request/response protocol

Every HTTP request has the same shape:

GET /users/42 HTTP/1.1
Host: api.example.com
Authorization: Bearer eyJhbG...
Accept: application/json
Accept-Encoding: gzip, br
User-Agent: Mozilla/5.0 (...)

And every response:

HTTP/1.1 200 OK
Content-Type: application/json
Cache-Control: public, max-age=60
Content-Length: 47
ETag: "a8c7..."

{"id": 42, "name": "ada", "role": "admin"}

HTTP methods

MethodIdempotent?Safe?Use for
GETyesyesreads
HEADyesyesmetadata only
PUTyesnoreplace
DELETEyesnodelete
PATCHnonopartial update
POSTnonoeverything else
OPTIONSyesyesCORS preflight

Idempotent means "doing it twice = doing it once." This matters because retries are everywhere in distributed systems — load balancers retry, clients retry, queues retry. You want your important operations to be idempotent.

Safe means "no side effects." GET and HEAD should never change server state. Caches and proxies rely on this guarantee to decide when they can safely answer without forwarding the request.

Status codes (the ones you'll actually use)

2xx  Success
  200  OK
  201  Created             ← include Location header
  202  Accepted             ← async work queued
  204  No Content
3xx  Redirect
  301  Moved Permanently
  302  Found (temporary)
  304  Not Modified         ← cache hit
4xx  Client error
  400  Bad Request
  401  Unauthorized         ← (really "unauthenticated")
  403  Forbidden
  404  Not Found
  409  Conflict
  410  Gone                 ← like 404 but permanent
  422  Unprocessable Entity ← validation failed
  429  Too Many Requests    ← rate limited
5xx  Server error
  500  Internal Server Error
  502  Bad Gateway          ← upstream is broken
  503  Service Unavailable
  504  Gateway Timeout      ← upstream too slow

Important HTTP headers

A category guide:

Caching

Cache-Control: public, max-age=300, s-maxage=3600
ETag: "a8c7..."
Last-Modified: Wed, 21 Oct 2025 07:28:00 GMT
Vary: Accept-Encoding, Authorization

Vary is critical for shared caches — it tells the cache which request headers affect the response. Forget Vary: Accept-Encoding and your CDN serves a gzip-encoded body to a non-gzip client.

Conditional

If-None-Match: "a8c7..."     → server may return 304
If-Modified-Since: ...        → server may return 304

Compression

Accept-Encoding: gzip, br, zstd
Content-Encoding: br

Brotli (br) beats gzip by ~20% for typical web content. Zstd is even faster to decompress; gradually rolling out.

Connection management

Connection: keep-alive
Keep-Alive: timeout=5, max=100

Security

Strict-Transport-Security: max-age=31536000; includeSubDomains
Content-Security-Policy: default-src 'self'
X-Content-Type-Options: nosniff
X-Frame-Options: DENY

HTTP/1.1 vs HTTP/2 vs HTTP/3

HTTP/1.1HTTP/2HTTP/3
Year199720152022
TransportTCPTCPQUIC (UDP)
MultiplexingNo (1 req per conn)Yes (streams)Yes
Header compressionNoneHPACKQPACK
Server pushNoYes (deprecated)Yes (defined in RFC 9114 §4.6, disabled in browsers)
Connection setupTCP + TLS = 2-3 RTTTCP + TLS = 2-3 RTT0–1 RTT
Head-of-line blockingYesYes (TCP layer)No
Connection migrationNoNoYes

HTTP/1.1: the limits

In HTTP/1.1, requests are serialized over a connection. Pipelining (sending multiple requests before getting responses) was in the spec but never worked in practice because servers can't reorder the responses. So browsers compensated by opening 6 parallel connections per origin, wasting bandwidth on duplicate handshakes.

HTTP/2: multiplexing

HTTP/2 lets many requests interleave on one connection. Independent streams carry request/response pairs; a slow response doesn't block fast ones — at the application layer.

The catch is that TCP head-of-line blocking survives at the transport layer. If one packet is lost, every stream stalls until the OS retransmits it, regardless of which stream actually needed that data.

HTTP/3: QUIC

QUIC moves multiplexing into the transport itself, so each stream has independent loss recovery. A dropped packet only stalls the stream that owns it. QUIC also folds TLS 1.3 directly into the connection handshake, cutting the setup from 2–3 RTTs (TCP + TLS) to 1 RTT, or 0-RTT on resumption.

flowchart TD
    A[HTTP/1.1] -->|"1 req/conn,<br/>6 conns/origin"| AOK[OK for 2010 web]
    B[HTTP/2] -->|"multiplex on TCP"| BOK[Better, but TCP HOL blocking]
    C[HTTP/3] -->|"multiplex on QUIC"| COK[No HOL, faster setup,<br/>connection migration]
    style A fill:#ff2e88,color:#fff
    style C fill:#15803d,color:#fff

The most underappreciated HTTP/3 feature is connection migration. A QUIC connection ID is independent of IP and port, so when your phone switches from Wi-Fi to LTE the connection survives without a teardown. HTTP/2 over TCP would tear down and re-establish — typically another 2–3 RTTs of latency.

When to bother

For internet-facing traffic, HTTP/2 is now the default and HTTP/3 is the next default. Your CDN (CloudFront, Cloudflare, Fastly) likely already serves both transparently. For internal services, gRPC is HTTP/2-native, so you get stream multiplexing for free.

Routing the public internet (BGP, anycast, AS)

The internet is a network of Autonomous Systems (AS) — each an organization (ISP, cloud, big company) with its own IP ranges. AS-to-AS routing is governed by BGP (Border Gateway Protocol). Every AS announces "I have routes to these IP ranges" and BGP picks the best path.

flowchart LR
    A[Your laptop<br/>AS 7922 Comcast] --> B[Tier 1 carrier<br/>AS 174 Cogent]
    B --> C[Cloud edge<br/>AS 16509 AWS]
    C --> D[Origin server]
    style A fill:#ff6b1a,color:#0a0a0f
    style D fill:#15803d,color:#fff

Anycast is the trick where multiple machines around the world all announce the same IP prefix. BGP routes each client to the topologically closest one. This is how Cloudflare's 1.1.1.1, AWS Route 53, and Google's 8.8.8.8 keep latency in the low single-digit milliseconds for users near a point of presence, and in the ~10–15ms range globally.

BGP outages are real — a misconfiguration at one AS can blackhole large chunks of the internet (Facebook 2021, Cloudflare 2020). When you read postmortems about those events, BGP is the culprit.

CORS (the bane of every frontend dev)

Cross-Origin Resource Sharing controls whether JavaScript on app.example.com can call api.example.com. By default, browsers block cross-origin requests. The server opts in:

Access-Control-Allow-Origin: https://app.example.com
Access-Control-Allow-Methods: GET, POST, OPTIONS
Access-Control-Allow-Headers: Content-Type, Authorization
Access-Control-Max-Age: 86400

For non-trivial requests (anything that sends a body or custom headers), the browser does a preflight OPTIONS request first. If that returns the right headers, the real request goes through.

One thing worth being clear about: CORS is enforced only by the browser. Tools like curl ignore it entirely. This means CORS is not a security boundary — your API must still authenticate every request. CORS is purely about which browser-side JavaScript is allowed to read the response.

Connection pooling

Opening a new TCP+TLS connection costs 100s of ms. Reusing connections is essentially free. Browsers open 6+ connections per origin for HTTP/1.1, or a single multiplexed connection for HTTP/2. Server-side HTTP clients should maintain a pool with a configured max-connections-per-host, and database clients should go through a pooler like PgBouncer or RDS Proxy ahead of Postgres.

The classic production mistake: a client with no max-conn limit opens 10,000 sockets to a struggling database. The database falls over, which makes the clients retry, which opens more sockets. The right config is a bounded pool that fails fast when full — returning an error immediately is far better than queuing work that will never complete.

A network design checklist

Before you finalize any architecture, run through these items in roughly the order listed — the top items catch the highest-leverage problems first, and each one that's missing typically shows up as a latency regression, a production incident, or a security finding:

  • Are all internal services on private IPs?
  • Is TLS terminated at the edge with a managed cert?
  • Is HTTP/2 (or H3) enabled at the edge?
  • Are connection pools sized and bounded?
  • Is keep-alive on for all internal HTTP clients?
  • Is DNS TTL low enough for failover but high enough to amortize?
  • Are you using anycast / GeoDNS for global low-latency entry?
  • Does CORS allow only the origins you intend, no wildcards?
  • Are deprecated TLS versions disabled?
  • Is Brotli/gzip configured? Is Vary: Accept-Encoding set?

The journey of a single web request

Click "post tweet" on twitter.com. What happens?

flowchart TD
    A[Browser] -->|1. DNS lookup| B[1.1.1.1]
    B -->|2. IP returned| A
    A -->|3. TCP handshake| C[Twitter Edge / CDN]
    A -->|4. TLS handshake| C
    A -->|5. HTTP/2 POST /tweet| C
    C -->|6. Forward to API LB| D[Load Balancer]
    D -->|7. Pick a healthy host| E[API Server]
    E -->|8. AuthN/AuthZ| F[(Identity Cache)]
    E -->|9. Write tweet| G[(Tweets DB)]
    E -->|10. Enqueue fanout| H[Kafka]
    H -->|11. Async| I[Fanout Workers]
    I -->|12. Push to followers| J[(Timeline Cache)]
    E -->|13. 201 Created| A
    style A fill:#ff6b1a,color:#0a0a0f
    style E fill:#0e7490,color:#fff
    style J fill:#15803d,color:#fff

Steps 1–5 are pure networking — DNS resolution, TCP setup, TLS negotiation. They happen before your application server has seen a single byte of the request, and they can easily cost 50–200ms. Optimizing them — keep-alive, HTTP/2, TLS 1.3, edge termination, anycast — is the easiest performance win available in production systems.

Things you should now be able to answer

  • What is the difference between TCP and UDP? When would you use UDP?
  • Why does HTTP/2 multiplex requests over a single connection, and what does HTTP/3 fix on top of that?
  • Why is TLS termination at the edge important?
  • A request from California to Frankfurt's data center has minimum what latency? Why?
  • Which HTTP methods are idempotent and why does that matter for retries?
  • Two services on the same private network — should they use TLS? What does mTLS buy you?
  • Your DNS TTL is 86400 (1 day). What happens during a regional failover?

→ Next: APIs and Communication Protocols

// FAQ

Frequently asked questions

What is the difference between TCP and UDP, and when should you use UDP?

TCP provides reliable, ordered delivery via a three-way handshake and retransmission, at the cost of 20-60 byte headers and head-of-line blocking. UDP has 8-byte headers, no connection setup, and no retransmission, making it the right choice for video streaming, gaming, and QUIC, where a dropped packet is better ignored than retransmitted — a stall visible as a freeze is worse than a tiny glitch.

How does HTTP/3 fix the head-of-line blocking problem that HTTP/2 left unsolved?

HTTP/2 multiplexes streams over a single TCP connection, but a single dropped TCP packet stalls every stream until the OS retransmits it. HTTP/3 runs over QUIC (UDP), where each stream has independent loss recovery, so a dropped packet only stalls the stream that owns it. QUIC also folds TLS 1.3 into the handshake, cutting connection setup from 2-3 RTTs to 1 RTT, or 0-RTT on resumption.

Why does TLS termination at the edge matter for performance and caching?

Terminating TLS at the edge means the user completes the expensive handshake with a server a few milliseconds away rather than an origin 150ms away. It also allows the CDN to cache responses, which is impossible with end-to-end TLS because a cache cannot read an encrypted payload. Cert renewal is also centralized rather than distributed across every origin box.

What trade-off does DNS TTL create between failover speed and nameserver load?

A short TTL like 60 seconds lets you reroute traffic away from a failing region in under a minute, but every machine that has cached the record hits your nameservers constantly. A long TTL like 86400 seconds (one day) dramatically reduces lookup volume but means clients are stuck pointing at the old address for up to a day during a failover. The TTL is a dial you tune between recovery speed and nameserver load.

Which HTTP methods are idempotent, and why does that matter in distributed systems?

GET, HEAD, PUT, DELETE, and OPTIONS are idempotent, meaning executing them twice produces the same outcome as executing them once. PATCH and POST are not. This matters because retries are ubiquitous in distributed systems — load balancers, clients, and queues all retry — and blindly retrying a non-idempotent operation like POST can create duplicate side effects.