◆◆◆Advancedasked at Cloudflareasked at Akamaiasked at Amazonasked at Fastly

Design a Content Delivery Network (CDN)

Q: What is the difference between anycast and DNS-based CDN routing?

Anycast has every PoP announce the same IP prefix via BGP, so the internet automatically steers packets to the topologically nearest PoP with no extra DNS lookup and automatic failover when a PoP withdraws its prefix. DNS-based routing has the CDN's authoritative DNS server return the IP of the nearest PoP based on the resolver's subnet, which allows fine-grained steering by latency measurement, country, or ISP but introduces a 60-120 second lag on failure detection due to DNS TTLs. Production CDNs combine both: anycast for network-layer routing and DNS for latency-aware tuning and canary traffic splits.

Q: Why does an origin shield reduce origin load more than a flat layer of edge caches?

Without a shield, each regional parent independently fetches a cold object from the origin on a miss, so ten regional parents each generate a separate upstream request. With an origin shield, all regional parents converge on one node, which fires a single fetch to the origin. At a 95% edge hit ratio on 62.5 million requests per second, the shield boundary already sees only 3.1 million requests per second, and an 80% shield hit ratio cuts that to roughly 620,000 requests per second actually reaching origin.

Q: How does request coalescing prevent a cache stampede?

When a popular object's TTL expires and thousands of concurrent requests all miss simultaneously, request coalescing has the edge lock the cache key and queue all subsequent requests for that object while sending exactly one upstream fetch. All queued requests are served from the single response when it arrives, so the upstream sees one request regardless of concurrent traffic volume.

Q: When should versioned URLs be preferred over the purge API for cache invalidation?

Versioned URLs are the baseline for any asset whose URL you control, such as JS, CSS, and image files, because embedding a file hash or build ID in the filename means old and new content have different cache keys and no invalidation step is needed — TTLs of a year or more are common for immutable assets. The purge API is necessary for content with stable URLs that cannot be renamed, like HTML pages or API responses, and for emergency takedowns; a global purge fans out to 250 PoPs times 20 servers, roughly 5,000 RPC calls, and should complete within 1-5 seconds.

Q: Why do CDNs use consistent hashing within a PoP, and what happens when a cache server is added?

Without consistent hashing, naive round-robin sends each URL to a random server so no server builds a useful cache for any given object, turning every server into a miss machine. Consistent hashing places both servers and URLs on a hash ring so each object always routes to the same server. Adding a server to a 20-node PoP displaces only the objects that hashed to that server's range, invalidating roughly 1 out of 21 (about 5%) of cached objects rather than causing a near-total cache flush.

Serve content from the edge, close to users, at massive scale. Request routing (anycast & DNS), cache hierarchies, invalidation, and origin shielding.

21 min read2026-04-05Ironclad Academy

#interview #caching #networking #edge

// DEPTH

the full breakdown — requirements, capacity, evolution, trade-offs

The problem

Cloudflare, Akamai, and AWS CloudFront collectively serve a significant fraction of all internet traffic — not because they're doing anything the origin server couldn't do, but because they do it from 50 metres away instead of 10,000 kilometres away. A CDN (Content Delivery Network) is a globally distributed network of caching servers, called Points of Presence (PoPs), positioned in cities around the world. When a user in Tokyo requests your company's homepage, the CDN serves the response from a Tokyo PoP — not from a data center in Virginia. That geographic shortcut cuts pure propagation delay from 150–200 ms down to 5–20 ms, before the first byte has even been read from disk.

The mechanics are straightforward in concept: cache popular content close to users, serve from cache on a hit, fetch from the origin only on a miss. What makes CDN design one of the more interesting interview problems is the surface area of hard decisions hiding underneath that simple idea. How do you route a user to the right PoP out of 250 globally? What happens when a popular object's TTL expires and 100,000 concurrent requests all miss at once — a stampede? How do you invalidate stale content across thousands of servers within seconds when a bug is discovered in a JavaScript bundle? And how do you prevent a traffic spike from forwarding millions of cache misses directly to an origin server that wasn't built to handle them?

The core engineering tension is between cache freshness and cache utility. You want long TTLs to maximize hit rates, but long TTLs mean stale content persists. You want to purge aggressively on content changes, but at 250 PoPs × 20 servers per PoP, a global purge touches 5,000 nodes. Every design decision — cache hierarchy, routing strategy, invalidation protocol, consistent hashing within a PoP — is really a different angle on that same trade-off. Getting this right is what separates a CDN that holds a 95% hit rate from one that quietly hammers the origin at scale.

Designing a CDN is one of the most layered problems in the interview catalog. It touches networking (BGP, anycast, DNS), distributed systems (cache hierarchies, consistent hashing, invalidation), and reliability engineering (PoP failure, stampedes, hot objects) all in one question. The best answers are built from first principles — starting with why a cache close to the user helps, then layering each mechanism to solve the next failure mode.

Functional requirements

Cache and serve static content (images, video segments, JS/CSS, binary downloads) from edge locations near users.
GET /assets/logo.png served with p99 < 20 ms from cache; origin response when missed.
Cache invalidation / purge: updated content must propagate within seconds.
Origin protection: shield origin from the raw traffic volume of global users.
(Optional) Edge compute: run lightweight logic (auth, A/B headers, redirects) at the edge without hitting origin.
(Optional) DDoS absorption: edge nodes absorb volumetric attacks before traffic reaches origin.

Non-functional requirements

Ultra-low read latency — cache hits < 20 ms p99; acceptable miss latency < 100 ms.
Very high availability — 99.99%+; a CDN outage is a catastrophic event.
Massive egress throughput — serve hundreds of Tbps without per-packet state.
Freshness guarantees — stale content caused by a failed purge can mean serving wrong, broken, or security-vulnerable assets.
Cost efficiency — egress bandwidth is expensive; every percentage point of hit ratio saved is real money.

Capacity estimation

Dimension	Estimate	How we got there
Peak egress bandwidth	100 Tbps	Baseline assumption for a hyperscale CDN
Average object size	200 KB	Mix of small API responses, images, video chunks
Global requests/sec	~62.5 M req/s	`100 Tbps = 12.5 TB/s`; `12.5 TB/s ÷ 200 KB = 62.5M req/s`
Avg requests/sec per PoP	~250,000 req/s	`62.5M ÷ 250 PoPs`
Peak req/s for large PoPs	~1M req/s	Skew factor for major hubs (US-East, Frankfurt)
Objects per edge server	~160 M	`32 TB SSD ÷ 200 KB/object = 160M objects`
Objects per PoP (20 servers)	~3.2 B	With consistent-hashed affinity, each server holds its own shard (160 M objects each, no cross-server replication) — PoP-level unique objects = 20 × 160 M = 3.2 B
Requests hitting origin shield	~3.1 M req/s	`62.5M × 0.05` (95% edge hit ratio)
Requests hitting actual origin	~620,000 req/s	`3.1M × 0.20` (shield hit ratio ~80%, so 20% miss through)
Popular objects TTL	hours–days	Top 5% of objects; cached at edge
Long-tail objects TTL	minutes–hours	Cached at regional parent
Dynamic content TTL	1–5 s or uncached	Not suitable for long-lived caching

Takeaway: A 95% edge hit ratio compresses 62.5M req/s to 3.1M at the shield boundary and ~620K at the actual origin. A 1-percentage-point improvement in edge hit ratio saves ~625K req/s at the shield boundary and ~125K req/s at the actual origin.

Building up to the design

V1: A single reverse proxy cache

Start with one Nginx or Varnish instance in one data center, cache on SSD. Users close to it see responses in under 5 ms and the origin gets some relief. The problem is obvious the moment you zoom out: a user in Singapore is still 200 ms from a US-East origin even with this cache, because the cache is in the same building as the origin. Geographic proximity is the whole point, and one data center doesn't give you that.

V2: Multiple PoPs — the routing problem

Spin up caching servers in 10 cities and you've solved the distance problem — for users close to those cities. But you've created a new question: when a request comes in, which city do you send it to?

Two mechanisms answer that, and you need both for a senior interview. DNS-based routing has the CDN's authoritative DNS server inspect the client's IP and return the A record of the nearest PoP. It's flexible — you can route by latency measurement, country, or ISP — but DNS TTLs mean stale routing can persist for minutes, and the resolver's IP often isn't the client's IP (EDNS Client Subnet partially addresses this).

Anycast routing takes a different approach: every PoP announces the same IP prefix via BGP. The internet's routing tables automatically steer each packet to the topologically nearest PoP. There's no DNS lookup overhead and PoP failures heal automatically when BGP withdraws the prefix. The limitation is that BGP proximity isn't the same as latency — the nearest AS-hop PoP isn't always the fastest one. Production CDNs combine both: anycast for the network-layer routing, DNS for fine-grained tuning like canary traffic splits and latency-aware health checks.

V3: Cache hierarchy — the hit ratio problem

One layer of caches can't hold everything. A PoP in São Paulo can't store every object in the CDN's catalog, and it shouldn't need to. Adding two more cache layers — a regional parent per continent, and an origin shield close to the actual origin server — multiplies the effective hit ratio at each boundary.

Without a shield, ten regional parents each independently miss to the origin on the same cold object, creating 10× the origin load. With a shield, all ten parents converge on one node, which fires a single upstream request. The origin sees one fetch, not ten. This is the same insight as request coalescing, applied at the infrastructure level.

V4: Request coalescing — the stampede problem

Even within a single PoP, a popular object's TTL expiring causes a miniature stampede: thousands of concurrent requests all miss simultaneously and all try to fetch from upstream at once. Request coalescing (also called request collapsing) solves this by having the edge hold all subsequent requests for an object being fetched, then serving them all from the single response when it arrives. One upstream request per object per TTL window, regardless of concurrent traffic volume. On a viral piece of content, this is the difference between one origin call and ten thousand.

V5: Production CDN

Once you have routing, hierarchy, and coalescing, you layer on TLS termination at the edge, consistent hashing within a PoP, invalidation infrastructure, and edge compute. That's the production design.

flowchart LR
    V1["V1: Single proxy<br/>local users only"] --> V2["V2: Multi-PoP<br/>DNS + anycast routing"]
    V2 --> V3["V3: Cache hierarchy<br/>edge → regional → shield"]
    V3 --> V4["V4: Request coalescing<br/>stampede prevention"]
    V4 --> V5["V5: Production CDN<br/>TLS, edge compute, invalidation"]
    style V1 fill:#0e7490,color:#fff
    style V3 fill:#15803d,color:#fff
    style V4 fill:#ff6b1a,color:#0a0a0f
    style V5 fill:#a855f7,color:#fff

High-level architecture

flowchart TD
    U[User] -->|"anycast or DNS"| EDGE["Edge cache server<br/>PoP: city-level"]
    EDGE -->|"TLS terminate"| EDGEHIT{"Cache hit?"}
    EDGEHIT -->|"yes → serve"| U
    EDGEHIT -->|"no → coalesce<br/>and fetch"| REG["Regional parent<br/>cache"]
    REG -->|"hit"| EDGE
    REG -->|"miss"| SHIELD["Origin shield PoP<br/>single upstream point"]
    SHIELD -->|"hit"| REG
    SHIELD -->|"miss"| ORI["Origin server"]
    ORI --> SHIELD

    INV["Purge / invalidation<br/>API"] -->|"flush by key or tag"| EDGE
    INV --> REG
    INV --> SHIELD

    DNS["Anycast + geo DNS<br/>routing control plane"] --> EDGE

    style EDGE fill:#ff6b1a,color:#0a0a0f
    style REG fill:#15803d,color:#fff
    style SHIELD fill:#0e7490,color:#fff
    style ORI fill:#a855f7,color:#fff
    style INV fill:#ffaa00,color:#0a0a0f
    style DNS fill:#ff2e88,color:#fff

Deep dive: request routing

DNS-based geo routing — mechanics

Customer configures assets.example.com CNAME assets.example.com.cdn.net.
User's DNS resolver queries assets.example.com.cdn.net.
CDN's authoritative DNS uses the resolver's IP address (or the client subnet from EDNS Client Subnet, RFC 7871) to look up which PoP is closest.
Returns an A record pointing to that PoP's IP.

The CDN maintains a continuously updated latency map per /24 subnet, built from active probing and passive measurement. This makes routing decisions that reflect real-world latency, not just geographic distance.

Trade-offs of DNS routing.

Property	DNS Routing	Anycast
Routing granularity	Fine — can split by subnet, country, ISP	Coarse — BGP AS-path level
Failure detection	Health checks + DNS TTL (60–120 s lag)	BGP withdrawal (seconds with BFD; minutes without)
Traffic steering	Easy — change DNS response	Hard — requires BGP prefix manipulation
Canary / gradual rollout	Easy with weighted DNS	Difficult
Cold start / first request	Extra DNS RTT	No extra RTT

Anycast — BGP mechanics

Every PoP's router announces the same IP prefix (e.g. 104.20.0.0/16) to its upstream transit providers. BGP evaluates multiple attributes in priority order (weight, local preference, AS-path length, MED, and others); in a CDN anycast deployment the higher-priority attributes are typically equal across all PoP announcements, so AS-hop count becomes the effective tiebreaker — the announcement with the fewest AS hops wins, selecting the topologically closest PoP.

When a PoP fails, its router stops announcing. Within seconds to a minute, BGP propagates the withdrawal and users are re-routed to the next-closest PoP automatically — no DNS TTL wait, no application-level failover needed.

The limitation is real: "nearest in BGP" diverges from "lowest latency" because BGP path selection does not account for actual link latency or congestion. This is why production networks combine anycast with active latency measurement and traffic engineering.

flowchart LR
    USER[User in Tokyo] -->|"packets to 104.20.0.0/16"| ISP[User's ISP]
    ISP -->|"BGP: shortest AS-path"| TOKYO[Tokyo PoP<br/>announces same prefix]
    ISP -.->|"if Tokyo fails,<br/>BGP withdraws"| OSAKA[Osaka PoP<br/>takes over automatically]
    style TOKYO fill:#ff6b1a,color:#0a0a0f
    style OSAKA fill:#15803d,color:#fff
    style ISP fill:#0e7490,color:#fff

Deep dive: cache mechanics

Cache keys

A cache key uniquely identifies an object. The default is the full URL, but the right key must account for content negotiation:

cache_key = normalize(url) + Vary-headers

Example:
  URL: /api/banner.json
  Vary: Accept-Encoding, Accept-Language
  → key: /api/banner.json | gzip | en-US

Poor key design causes two problems: under-keying (serving the wrong content variant to a user) and over-keying (fragmenting the cache into tiny per-variant buckets, killing hit ratio). The CDN must strip irrelevant headers — session cookies, tracking params — from the key.

TTL and Cache-Control

The CDN respects standard HTTP cache headers from the origin:

Cache-Control: public, max-age=86400, stale-while-revalidate=3600
ETag: "abc123"
Last-Modified: Mon, 30 Mar 2026 12:00:00 GMT

max-age=86400 tells the CDN to serve from cache for up to 24 hours without checking origin. stale-while-revalidate=3600 is particularly useful: after TTL expires, the edge serves the stale object immediately while triggering a background revalidation — so the user never waits for a blocking upstream fetch on expiry. When the ETag hasn't changed, the origin responds with 304 Not Modified and the edge simply extends its TTL without re-transferring the body.

The CDN may also apply its own edge TTL that overrides the origin's Cache-Control, which can serve stale content if used carelessly.

Cache miss sequence

sequenceDiagram
    participant U as User
    participant E as Edge cache
    participant R as Regional parent
    participant O as Origin shield
    participant ORI as Origin server

    U->>E: GET /image.jpg
    E-->>E: Miss (object not in cache)
    Note over E: Coalesce: lock key,<br/>queue subsequent requests

    E->>R: GET /image.jpg
    R-->>R: Miss
    R->>O: GET /image.jpg
    O-->>O: Miss
    O->>ORI: GET /image.jpg
    ORI-->>O: 200 OK + Cache-Control
    O-->>R: 200 OK (stored at shield)
    R-->>E: 200 OK (stored at regional)
    E-->>U: 200 OK (stored at edge)
    Note over E: All queued requests served<br/>from newly cached object

Push CDN vs. pull CDN

	Pull CDN	Push CDN
How content arrives at edge	On first user request (lazy)	Pre-loaded by operator before first request
Best for	General web assets, unpredictable access patterns	Known-large content: software downloads, game patches, live stream pre-seeding
Cold start penalty	First user pays full origin RTT	None — object is warm before launch
Storage efficiency	Only caches what's requested	Must pre-populate; wastes space if content isn't accessed
Invalidation	TTL + purge API	Operator deletes explicitly from all PoPs

Most CDNs support both models. Static web assets use pull; a video platform pre-seeding a new film release across all PoPs uses push.

Deep dive: consistent hashing within a PoP

A single PoP has multiple cache server nodes. When a request arrives at the PoP's load balancer, it must decide which cache server holds — or should hold — the object.

Naive round-robin makes this hopeless: every server gets a random subset of requests for every object, so no server ever accumulates a useful cache for any given URL. Every server becomes a miss machine.

Consistent hashing (see consistent hashing for the full algorithm) places both servers and objects on a hash ring. An object always routes to the same server. When a server is added or removed, only the objects that hashed to that server's range are displaced — everything else stays put.

flowchart LR
    REQ["Incoming request<br/>for /image.jpg"] --> HASH["hash(url) = 0x7f3a"]
    HASH --> RING["Hash ring<br/>walk clockwise"]
    RING --> SRV["Cache server C<br/>(owns this range)"]
    SRV -->|"cache hit"| RESP["Serve response"]
    SRV -->|"cache miss"| UP["Fetch from parent"]
    style RING fill:#ff6b1a,color:#0a0a0f
    style SRV fill:#15803d,color:#fff

Virtual nodes (vnodes) give each server multiple positions on the ring to ensure uniform distribution even with heterogeneous hardware.

The payoff: adding a server to a 20-node PoP invalidates roughly 1/21 ≈ 5% of cached objects — far better than the ~100% invalidation a naive scheme would cause.

Deep dive: cache invalidation

"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton

CDN cache invalidation is a distributed coordination problem at scale: you must flush an object from potentially thousands of cache servers within seconds.

Strategy 1: Versioned URLs (preferred)

Before:  https://assets.example.com/app.js
After:   https://assets.example.com/app.a4f7c2b.js

The file hash or build ID is embedded in the URL. Old and new content have different cache keys, so there is no invalidation step — the old URL simply stops being requested once the HTML that referenced it is updated. You get long TTLs (a year is common for immutable assets) for free.

The catch is that this only works for content you control the URL of. HTML pages, API responses, and anything else with a stable URL can't be versioned by filename, and that's exactly where the purge API comes in.

Strategy 2: Hard purge API

An operator calls the CDN's purge API to immediately remove a specific URL or set of URLs from all edge caches:

POST /v1/purge
{ "urls": ["https://example.com/index.html"] }

The CDN's control plane fans this out to all PoPs and waits for confirmation. Good implementations complete a global purge within 1–5 seconds.

The mechanics reveal two challenges worth naming. Fan-out cost: 250 PoPs × 20 servers = 5,000 RPC calls per purge. For "purge everything" during a security incident, this must be queued and rate-limited. Propagation lag: during the propagation window, some users get old content and some get new — usually acceptable, but not for correctness-critical assets.

Surrogate keys / cache tags are the solution when you need to purge by logical group. Tag objects at cache time (Surrogate-Key: product:42 category:shoes) and one purge call removes every variant of product 42's page without enumerating individual URLs.

Strategy 3: Stale-while-revalidate (soft expiry)

This isn't really invalidation — it's graceful expiry. When the TTL expires, the next request gets the stale object immediately while a background refresh runs. The user never waits. If the refresh fails because the origin is down, content continues to be served stale. Low risk, but not suitable when freshness is critical.

Invalidation state diagram

stateDiagram-v2
    [*] --> Fresh: object stored, TTL starts
    Fresh --> Stale: TTL expires
    Stale --> Fresh: background revalidation succeeds (304 or 200)
    Stale --> ServedStale: SWR window active
    ServedStale --> Fresh: revalidation completes
    Fresh --> Purged: hard purge received
    Purged --> [*]: object evicted
    Purged --> Fresh: next request fetches from origin

Deep dive: dynamic content and edge compute

Pure caching doesn't help for personalized, user-specific, or real-time-computed responses. Two approaches address this.

Dynamic acceleration (no caching)

The CDN terminates the user's TCP/TLS connection at the edge but maintains a persistent, pre-warmed TCP connection from the edge to the origin over the CDN's optimized private backbone. This avoids the per-user TCP handshake across the internet to the origin (saves 200+ ms) and TLS renegotiation across the internet. The request still goes to origin for every call, but on a fast, low-latency internal path. Published data from major CDN providers typically shows time-to-first-byte improvements of 15–40% for dynamic content, depending on geography and the origin's proximity to the CDN backbone.

Edge compute (serverless at the PoP)

Run a small function at the edge (e.g., Cloudflare Workers, AWS Lambda@Edge) that validates auth tokens without a round trip to origin, adds or modifies headers, A/B routes to different origins, or generates synthetic responses for redirects and errors without touching origin at all.

Edge compute enables a class of applications that need low latency and dynamic logic — the CDN is no longer just a cache but a programmable tier.

TLS termination at the edge

Every user's HTTPS connection terminates at the nearest edge PoP, not at the origin. TLS 1.3 requires 1 RTT for a new handshake (0-RTT resumption for returning clients). Edge PoPs are 5–20 ms away; origins are 100–200 ms away. Terminating at the edge saves one to two full RTTs on connection setup.

The trade-off is trust: the CDN holds TLS certificates on behalf of the customer and can see plaintext traffic. This is the standard CDN model and most customers accept it. For higher assurance, some configurations add mutual TLS between edge and origin to prevent anyone from bypassing the CDN and hitting the origin directly.

The CDN re-encrypts traffic to the origin over HTTPS using the origin's certificate. The internal backbone is not necessarily encrypted hop-to-hop, though most major providers encrypt it.

Failure modes and mitigations

PoP failure

A PoP's servers go offline — hardware, network partition, software bug. Anycast BGP withdraws the prefix; DNS health checks stop returning the PoP's IP. Users re-route to the next closest PoP. With BFD (Bidirectional Forwarding Detection) and pre-computed backup paths, optimized networks converge in seconds; untuned ISP paths can take several minutes. Users in-flight during the failure see connection resets and retry. CDN SLAs count this as degraded (higher latency), not unavailable, because content is still served from elsewhere.

Cache stampede on popular content

A hot object's TTL expires simultaneously across all edge servers in a PoP, and all of them try to fetch from upstream at once. Request coalescing (as described above) serializes these into one upstream request — only one server sends the fetch while the rest wait. An alternative is probabilistic early expiration: before the TTL expires, each incoming request independently decides with increasing probability whether to refresh the object early (the XFetch/exponential algorithm is the canonical implementation), so the object is refreshed before it goes cold rather than triggering a synchronized miss.

Stale content after failed purge

A purge API call is dropped, or a PoP is partitioned from the control plane during propagation. Some PoPs continue serving the old version. The mitigation is an acknowledgment protocol — the control plane retries until all PoPs confirm, or marks the PoP unhealthy. Short background TTLs for mutable content, and versioned URLs for assets where this is intolerable.

Hot object / traffic skew

One object (a viral video thumbnail, say) receives 10× expected traffic, overloading the cache servers that own it via consistent hashing. The fix is a small replication factor for hot objects: detect high-request-rate objects and replicate them to multiple servers within the PoP, widening the effective hash range. This trades some cache efficiency for hot-key resilience.

Origin overload

A cache miss storm — cold start, invalidation of a large object set, traffic spike from a new product launch — swamps the origin. Origin shield collapses all regional parents to one upstream point; request coalescing reduces concurrent fetches; a circuit breaker has the edge return stale or a synthetic error rather than hammering a degraded origin; and origin autoscaling triggered by shield health metrics can add capacity ahead of the storm.

Storage choices

Layer	Storage medium	Sizing rationale
Edge cache server	NVMe SSD (20–40 TB) + DRAM hot tier (128–512 GB)	DRAM holds top 0.1% of objects; SSD holds the rest. NVMe gives < 100 µs seek.
Regional parent cache	SSD (100–400 TB per cluster)	Larger working set; must cover long-tail that misses at edge.
Origin shield	SSD or SSD-backed object store	Acts as last-mile cache; object set can be huge for large customers.
Origin	Any — S3, custom servers, app tier	Not CDN's concern; should be SSD-backed or object storage.

Eviction at each layer uses LRU with frequency weighting (LRFU or TinyLFU-style) to prevent one-hit wonders from evicting hot objects. Large media objects are also evicted more aggressively since they consume disproportionate space per object.

Things to discuss in an interview

Routing mechanism: be precise — anycast vs. DNS, when each applies, how failure is handled.
Cache hierarchy: name the layers and explain why each one exists (hit ratio multiplication, origin protection).
Consistent hashing: why it matters within a PoP; what goes wrong without it.
Invalidation strategy: version keys as the baseline; purge API for mutable content; surrogate keys for bulk invalidation.
Request coalescing: this is the answer to "thundering herd at the CDN layer" — don't skip it.
Edge compute vs. dynamic acceleration: two different tools for dynamic content; know the difference.
DDoS absorption as a side effect: large anycast surface area distributes volumetric attacks across all PoPs; the CDN's bandwidth (100 Tbps) dwarfs most attack volumes.

Things you should now be able to answer

What is the difference between anycast and DNS-based CDN routing? When does each fail?
Why does a cache hierarchy with an origin shield reduce origin load by more than a flat layer of edge caches?
How does request coalescing prevent a cache stampede, and what is probabilistic early expiration?
Why do CDNs use consistent hashing within a PoP, and what is the effect of adding a new cache server?
What are the trade-offs between versioned URLs and the purge API for cache invalidation?
How does TLS termination at the edge improve connection latency?
What happens to user traffic when a PoP fails, and how long does re-routing take?

Frequently asked questions

▸What is the difference between anycast and DNS-based CDN routing?

Anycast has every PoP announce the same IP prefix via BGP, so the internet automatically steers packets to the topologically nearest PoP with no extra DNS lookup and automatic failover when a PoP withdraws its prefix. DNS-based routing has the CDN's authoritative DNS server return the IP of the nearest PoP based on the resolver's subnet, which allows fine-grained steering by latency measurement, country, or ISP but introduces a 60-120 second lag on failure detection due to DNS TTLs. Production CDNs combine both: anycast for network-layer routing and DNS for latency-aware tuning and canary traffic splits.

▸Why does an origin shield reduce origin load more than a flat layer of edge caches?

Without a shield, each regional parent independently fetches a cold object from the origin on a miss, so ten regional parents each generate a separate upstream request. With an origin shield, all regional parents converge on one node, which fires a single fetch to the origin. At a 95% edge hit ratio on 62.5 million requests per second, the shield boundary already sees only 3.1 million requests per second, and an 80% shield hit ratio cuts that to roughly 620,000 requests per second actually reaching origin.

▸How does request coalescing prevent a cache stampede?

When a popular object's TTL expires and thousands of concurrent requests all miss simultaneously, request coalescing has the edge lock the cache key and queue all subsequent requests for that object while sending exactly one upstream fetch. All queued requests are served from the single response when it arrives, so the upstream sees one request regardless of concurrent traffic volume.

▸When should versioned URLs be preferred over the purge API for cache invalidation?

Versioned URLs are the baseline for any asset whose URL you control, such as JS, CSS, and image files, because embedding a file hash or build ID in the filename means old and new content have different cache keys and no invalidation step is needed — TTLs of a year or more are common for immutable assets. The purge API is necessary for content with stable URLs that cannot be renamed, like HTML pages or API responses, and for emergency takedowns; a global purge fans out to 250 PoPs times 20 servers, roughly 5,000 RPC calls, and should complete within 1-5 seconds.

▸Why do CDNs use consistent hashing within a PoP, and what happens when a cache server is added?

Without consistent hashing, naive round-robin sends each URL to a random server so no server builds a useful cache for any given object, turning every server into a miss machine. Consistent hashing places both servers and URLs on a hash ring so each object always routes to the same server. Adding a server to a 20-node PoP displaces only the objects that hashed to that server's range, invalidating roughly 1 out of 21 (about 5%) of cached objects rather than causing a near-total cache flush.

← previous

Design a Real-Time Leaderboard (gaming)

Design a Distributed Lock / Coordination Service (ZooKeeper / etcd)

// RELATED

Frequently asked questions

You may also like

Design an LLM Observability Platform

Design an LLM Gateway (AI Gateway & Model Router)

Design an LLM Fine-Tuning Platform