Design a Content Delivery Network (CDN)
Serve content from the edge, close to users, at massive scale. Request routing (anycast & DNS), cache hierarchies, invalidation, and origin shielding.
The problem
Cloudflare, Akamai, and AWS CloudFront collectively serve a significant fraction of all internet traffic — not because they're doing anything the origin server couldn't do, but because they do it from 50 metres away instead of 10,000 kilometres away. A CDN (Content Delivery Network) is a globally distributed network of caching servers, called Points of Presence (PoPs), positioned in cities around the world. When a user in Tokyo requests your company's homepage, the CDN serves the response from a Tokyo PoP — not from a data center in Virginia. That geographic shortcut cuts pure propagation delay from 150–200 ms down to 5–20 ms, before the first byte has even been read from disk.
The mechanics are straightforward in concept: cache popular content close to users, serve from cache on a hit, fetch from the origin only on a miss. What makes CDN design one of the more interesting interview problems is the surface area of hard decisions hiding underneath that simple idea. How do you route a user to the right PoP out of 250 globally? What happens when a popular object's TTL expires and 100,000 concurrent requests all miss at once — a stampede? How do you invalidate stale content across thousands of servers within seconds when a bug is discovered in a JavaScript bundle? And how do you prevent a traffic spike from forwarding millions of cache misses directly to an origin server that wasn't built to handle them?
The core engineering tension is between cache freshness and cache utility. You want long TTLs to maximize hit rates, but long TTLs mean stale content persists. You want to purge aggressively on content changes, but at 250 PoPs × 20 servers per PoP, a global purge touches 5,000 nodes. Every design decision — cache hierarchy, routing strategy, invalidation protocol, consistent hashing within a PoP — is really a different angle on that same trade-off. Getting this right is what separates a CDN that holds a 95% hit rate from one that quietly hammers the origin at scale.
Designing a CDN is one of the most layered problems in the interview catalog. It touches networking (BGP, anycast, DNS), distributed systems (cache hierarchies, consistent hashing, invalidation), and reliability engineering (PoP failure, stampedes, hot objects) all in one question. The best answers are built from first principles — starting with why a cache close to the user helps, then layering each mechanism to solve the next failure mode.
Functional requirements
- Cache and serve static content (images, video segments, JS/CSS, binary downloads) from edge locations near users.
GET /assets/logo.pngserved with p99 < 20 ms from cache; origin response when missed.- Cache invalidation / purge: updated content must propagate within seconds.
- Origin protection: shield origin from the raw traffic volume of global users.
- (Optional) Edge compute: run lightweight logic (auth, A/B headers, redirects) at the edge without hitting origin.
- (Optional) DDoS absorption: edge nodes absorb volumetric attacks before traffic reaches origin.
Non-functional requirements
- Ultra-low read latency — cache hits < 20 ms p99; acceptable miss latency < 100 ms.
- Very high availability — 99.99%+; a CDN outage is a catastrophic event.
- Massive egress throughput — serve hundreds of Tbps without per-packet state.
- Freshness guarantees — stale content caused by a failed purge can mean serving wrong, broken, or security-vulnerable assets.
- Cost efficiency — egress bandwidth is expensive; every percentage point of hit ratio saved is real money.
Capacity estimation
| Dimension | Estimate | How we got there |
|---|---|---|
| Peak egress bandwidth | 100 Tbps | Baseline assumption for a hyperscale CDN |
| Average object size | 200 KB | Mix of small API responses, images, video chunks |
| Global requests/sec | ~62.5 M req/s | 100 Tbps = 12.5 TB/s; 12.5 TB/s ÷ 200 KB = 62.5M req/s |
| Avg requests/sec per PoP | ~250,000 req/s | 62.5M ÷ 250 PoPs |
| Peak req/s for large PoPs | ~1M req/s | Skew factor for major hubs (US-East, Frankfurt) |
| Objects per edge server | ~160 M | 32 TB SSD ÷ 200 KB/object = 160M objects |
| Objects per PoP (20 servers) | ~3.2 B | With consistent-hashed affinity, each server holds its own shard (160 M objects each, no cross-server replication) — PoP-level unique objects = 20 × 160 M = 3.2 B |
| Requests hitting origin shield | ~3.1 M req/s | 62.5M × 0.05 (95% edge hit ratio) |
| Requests hitting actual origin | ~620,000 req/s | 3.1M × 0.20 (shield hit ratio ~80%, so 20% miss through) |
| Popular objects TTL | hours–days | Top 5% of objects; cached at edge |
| Long-tail objects TTL | minutes–hours | Cached at regional parent |
| Dynamic content TTL | 1–5 s or uncached | Not suitable for long-lived caching |
Takeaway: A 95% edge hit ratio compresses 62.5M req/s to 3.1M at the shield boundary and ~620K at the actual origin. A 1-percentage-point improvement in edge hit ratio saves ~625K req/s at the shield boundary and ~125K req/s at the actual origin.
Building up to the design
V1: A single reverse proxy cache
Start with one Nginx or Varnish instance in one data center, cache on SSD. Users close to it see responses in under 5 ms and the origin gets some relief. The problem is obvious the moment you zoom out: a user in Singapore is still 200 ms from a US-East origin even with this cache, because the cache is in the same building as the origin. Geographic proximity is the whole point, and one data center doesn't give you that.
V2: Multiple PoPs — the routing problem
Spin up caching servers in 10 cities and you've solved the distance problem — for users close to those cities. But you've created a new question: when a request comes in, which city do you send it to?
Two mechanisms answer that, and you need both for a senior interview. DNS-based routing has the CDN's authoritative DNS server inspect the client's IP and return the A record of the nearest PoP. It's flexible — you can route by latency measurement, country, or ISP — but DNS TTLs mean stale routing can persist for minutes, and the resolver's IP often isn't the client's IP (EDNS Client Subnet partially addresses this).
Anycast routing takes a different approach: every PoP announces the same IP prefix via BGP. The internet's routing tables automatically steer each packet to the topologically nearest PoP. There's no DNS lookup overhead and PoP failures heal automatically when BGP withdraws the prefix. The limitation is that BGP proximity isn't the same as latency — the nearest AS-hop PoP isn't always the fastest one. Production CDNs combine both: anycast for the network-layer routing, DNS for fine-grained tuning like canary traffic splits and latency-aware health checks.
V3: Cache hierarchy — the hit ratio problem
One layer of caches can't hold everything. A PoP in São Paulo can't store every object in the CDN's catalog, and it shouldn't need to. Adding two more cache layers — a regional parent per continent, and an origin shield close to the actual origin server — multiplies the effective hit ratio at each boundary.
Without a shield, ten regional parents each independently miss to the origin on the same cold object, creating 10× the origin load. With a shield, all ten parents converge on one node, which fires a single upstream request. The origin sees one fetch, not ten. This is the same insight as request coalescing, applied at the infrastructure level.
V4: Request coalescing — the stampede problem
Even within a single PoP, a popular object's TTL expiring causes a miniature stampede: thousands of concurrent requests all miss simultaneously and all try to fetch from upstream at once. Request coalescing (also called request collapsing) solves this by having the edge hold all subsequent requests for an object being fetched, then serving them all from the single response when it arrives. One upstream request per object per TTL window, regardless of concurrent traffic volume. On a viral piece of content, this is the difference between one origin call and ten thousand.
V5: Production CDN
Once you have routing, hierarchy, and coalescing, you layer on TLS termination at the edge, consistent hashing within a PoP, invalidation infrastructure, and edge compute. That's the production design.
flowchart LR
V1["V1: Single proxy<br/>local users only"] --> V2["V2: Multi-PoP<br/>DNS + anycast routing"]
V2 --> V3["V3: Cache hierarchy<br/>edge → regional → shield"]
V3 --> V4["V4: Request coalescing<br/>stampede prevention"]
V4 --> V5["V5: Production CDN<br/>TLS, edge compute, invalidation"]
style V1 fill:#0e7490,color:#fff
style V3 fill:#15803d,color:#fff
style V4 fill:#ff6b1a,color:#0a0a0f
style V5 fill:#a855f7,color:#fff
High-level architecture
flowchart TD
U[User] -->|"anycast or DNS"| EDGE["Edge cache server<br/>PoP: city-level"]
EDGE -->|"TLS terminate"| EDGEHIT{"Cache hit?"}
EDGEHIT -->|"yes → serve"| U
EDGEHIT -->|"no → coalesce<br/>and fetch"| REG["Regional parent<br/>cache"]
REG -->|"hit"| EDGE
REG -->|"miss"| SHIELD["Origin shield PoP<br/>single upstream point"]
SHIELD -->|"hit"| REG
SHIELD -->|"miss"| ORI["Origin server"]
ORI --> SHIELD
INV["Purge / invalidation<br/>API"] -->|"flush by key or tag"| EDGE
INV --> REG
INV --> SHIELD
DNS["Anycast + geo DNS<br/>routing control plane"] --> EDGE
style EDGE fill:#ff6b1a,color:#0a0a0f
style REG fill:#15803d,color:#fff
style SHIELD fill:#0e7490,color:#fff
style ORI fill:#a855f7,color:#fff
style INV fill:#ffaa00,color:#0a0a0f
style DNS fill:#ff2e88,color:#fff
Deep dive: request routing
DNS-based geo routing — mechanics
- Customer configures
assets.example.com CNAME assets.example.com.cdn.net. - User's DNS resolver queries
assets.example.com.cdn.net. - CDN's authoritative DNS uses the resolver's IP address (or the client subnet from EDNS Client Subnet, RFC 7871) to look up which PoP is closest.
- Returns an A record pointing to that PoP's IP.
The CDN maintains a continuously updated latency map per /24 subnet, built from active probing and passive measurement. This makes routing decisions that reflect real-world latency, not just geographic distance.
Trade-offs of DNS routing.
| Property | DNS Routing | Anycast |
|---|---|---|
| Routing granularity | Fine — can split by subnet, country, ISP | Coarse — BGP AS-path level |
| Failure detection | Health checks + DNS TTL (60–120 s lag) | BGP withdrawal (seconds with BFD; minutes without) |
| Traffic steering | Easy — change DNS response | Hard — requires BGP prefix manipulation |
| Canary / gradual rollout | Easy with weighted DNS | Difficult |
| Cold start / first request | Extra DNS RTT | No extra RTT |
Anycast — BGP mechanics
Every PoP's router announces the same IP prefix (e.g. 104.20.0.0/16) to its upstream transit providers. BGP evaluates multiple attributes in priority order (weight, local preference, AS-path length, MED, and others); in a CDN anycast deployment the higher-priority attributes are typically equal across all PoP announcements, so AS-hop count becomes the effective tiebreaker — the announcement with the fewest AS hops wins, selecting the topologically closest PoP.
When a PoP fails, its router stops announcing. Within seconds to a minute, BGP propagates the withdrawal and users are re-routed to the next-closest PoP automatically — no DNS TTL wait, no application-level failover needed.
The limitation is real: "nearest in BGP" diverges from "lowest latency" because BGP path selection does not account for actual link latency or congestion. This is why production networks combine anycast with active latency measurement and traffic engineering.
flowchart LR
USER[User in Tokyo] -->|"packets to 104.20.0.0/16"| ISP[User's ISP]
ISP -->|"BGP: shortest AS-path"| TOKYO[Tokyo PoP<br/>announces same prefix]
ISP -.->|"if Tokyo fails,<br/>BGP withdraws"| OSAKA[Osaka PoP<br/>takes over automatically]
style TOKYO fill:#ff6b1a,color:#0a0a0f
style OSAKA fill:#15803d,color:#fff
style ISP fill:#0e7490,color:#fff
Deep dive: cache mechanics
Cache keys
A cache key uniquely identifies an object. The default is the full URL, but the right key must account for content negotiation:
cache_key = normalize(url) + Vary-headers
Example:
URL: /api/banner.json
Vary: Accept-Encoding, Accept-Language
→ key: /api/banner.json | gzip | en-US
Poor key design causes two problems: under-keying (serving the wrong content variant to a user) and over-keying (fragmenting the cache into tiny per-variant buckets, killing hit ratio). The CDN must strip irrelevant headers — session cookies, tracking params — from the key.
TTL and Cache-Control
The CDN respects standard HTTP cache headers from the origin:
Cache-Control: public, max-age=86400, stale-while-revalidate=3600
ETag: "abc123"
Last-Modified: Mon, 30 Mar 2026 12:00:00 GMT
max-age=86400 tells the CDN to serve from cache for up to 24 hours without checking origin. stale-while-revalidate=3600 is particularly useful: after TTL expires, the edge serves the stale object immediately while triggering a background revalidation — so the user never waits for a blocking upstream fetch on expiry. When the ETag hasn't changed, the origin responds with 304 Not Modified and the edge simply extends its TTL without re-transferring the body.
The CDN may also apply its own edge TTL that overrides the origin's Cache-Control, which can serve stale content if used carelessly.
Cache miss sequence
sequenceDiagram
participant U as User
participant E as Edge cache
participant R as Regional parent
participant O as Origin shield
participant ORI as Origin server
U->>E: GET /image.jpg
E-->>E: Miss (object not in cache)
Note over E: Coalesce: lock key,<br/>queue subsequent requests
E->>R: GET /image.jpg
R-->>R: Miss
R->>O: GET /image.jpg
O-->>O: Miss
O->>ORI: GET /image.jpg
ORI-->>O: 200 OK + Cache-Control
O-->>R: 200 OK (stored at shield)
R-->>E: 200 OK (stored at regional)
E-->>U: 200 OK (stored at edge)
Note over E: All queued requests served<br/>from newly cached object
Push CDN vs. pull CDN
| Pull CDN | Push CDN | |
|---|---|---|
| How content arrives at edge | On first user request (lazy) | Pre-loaded by operator before first request |
| Best for | General web assets, unpredictable access patterns | Known-large content: software downloads, game patches, live stream pre-seeding |
| Cold start penalty | First user pays full origin RTT | None — object is warm before launch |
| Storage efficiency | Only caches what's requested | Must pre-populate; wastes space if content isn't accessed |
| Invalidation | TTL + purge API | Operator deletes explicitly from all PoPs |
Most CDNs support both models. Static web assets use pull; a video platform pre-seeding a new film release across all PoPs uses push.
Deep dive: consistent hashing within a PoP
A single PoP has multiple cache server nodes. When a request arrives at the PoP's load balancer, it must decide which cache server holds — or should hold — the object.
Naive round-robin makes this hopeless: every server gets a random subset of requests for every object, so no server ever accumulates a useful cache for any given URL. Every server becomes a miss machine.
Consistent hashing (see consistent hashing for the full algorithm) places both servers and objects on a hash ring. An object always routes to the same server. When a server is added or removed, only the objects that hashed to that server's range are displaced — everything else stays put.
flowchart LR
REQ["Incoming request<br/>for /image.jpg"] --> HASH["hash(url) = 0x7f3a"]
HASH --> RING["Hash ring<br/>walk clockwise"]
RING --> SRV["Cache server C<br/>(owns this range)"]
SRV -->|"cache hit"| RESP["Serve response"]
SRV -->|"cache miss"| UP["Fetch from parent"]
style RING fill:#ff6b1a,color:#0a0a0f
style SRV fill:#15803d,color:#fff
Virtual nodes (vnodes) give each server multiple positions on the ring to ensure uniform distribution even with heterogeneous hardware.
The payoff: adding a server to a 20-node PoP invalidates roughly 1/21 ≈ 5% of cached objects — far better than the ~100% invalidation a naive scheme would cause.
Deep dive: cache invalidation
"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton
CDN cache invalidation is a distributed coordination problem at scale: you must flush an object from potentially thousands of cache servers within seconds.
Strategy 1: Versioned URLs (preferred)
Before: https://assets.example.com/app.js
After: https://assets.example.com/app.a4f7c2b.js
The file hash or build ID is embedded in the URL. Old and new content have different cache keys, so there is no invalidation step — the old URL simply stops being requested once the HTML that referenced it is updated. You get long TTLs (a year is common for immutable assets) for free.
The catch is that this only works for content you control the URL of. HTML pages, API responses, and anything else with a stable URL can't be versioned by filename, and that's exactly where the purge API comes in.
Strategy 2: Hard purge API
An operator calls the CDN's purge API to immediately remove a specific URL or set of URLs from all edge caches:
POST /v1/purge
{ "urls": ["https://example.com/index.html"] }
The CDN's control plane fans this out to all PoPs and waits for confirmation. Good implementations complete a global purge within 1–5 seconds.
The mechanics reveal two challenges worth naming. Fan-out cost: 250 PoPs × 20 servers = 5,000 RPC calls per purge. For "purge everything" during a security incident, this must be queued and rate-limited. Propagation lag: during the propagation window, some users get old content and some get new — usually acceptable, but not for correctness-critical assets.
Surrogate keys / cache tags are the solution when you need to purge by logical group. Tag objects at cache time (Surrogate-Key: product:42 category:shoes) and one purge call removes every variant of product 42's page without enumerating individual URLs.
Strategy 3: Stale-while-revalidate (soft expiry)
This isn't really invalidation — it's graceful expiry. When the TTL expires, the next request gets the stale object immediately while a background refresh runs. The user never waits. If the refresh fails because the origin is down, content continues to be served stale. Low risk, but not suitable when freshness is critical.
Invalidation state diagram
stateDiagram-v2
[*] --> Fresh: object stored, TTL starts
Fresh --> Stale: TTL expires
Stale --> Fresh: background revalidation succeeds (304 or 200)
Stale --> ServedStale: SWR window active
ServedStale --> Fresh: revalidation completes
Fresh --> Purged: hard purge received
Purged --> [*]: object evicted
Purged --> Fresh: next request fetches from origin
Deep dive: dynamic content and edge compute
Pure caching doesn't help for personalized, user-specific, or real-time-computed responses. Two approaches address this.
Dynamic acceleration (no caching)
The CDN terminates the user's TCP/TLS connection at the edge but maintains a persistent, pre-warmed TCP connection from the edge to the origin over the CDN's optimized private backbone. This avoids the per-user TCP handshake across the internet to the origin (saves 200+ ms) and TLS renegotiation across the internet. The request still goes to origin for every call, but on a fast, low-latency internal path. Published data from major CDN providers typically shows time-to-first-byte improvements of 15–40% for dynamic content, depending on geography and the origin's proximity to the CDN backbone.
Edge compute (serverless at the PoP)
Run a small function at the edge (e.g., Cloudflare Workers, AWS Lambda@Edge) that validates auth tokens without a round trip to origin, adds or modifies headers, A/B routes to different origins, or generates synthetic responses for redirects and errors without touching origin at all.
Edge compute enables a class of applications that need low latency and dynamic logic — the CDN is no longer just a cache but a programmable tier.
TLS termination at the edge
Every user's HTTPS connection terminates at the nearest edge PoP, not at the origin. TLS 1.3 requires 1 RTT for a new handshake (0-RTT resumption for returning clients). Edge PoPs are 5–20 ms away; origins are 100–200 ms away. Terminating at the edge saves one to two full RTTs on connection setup.
The trade-off is trust: the CDN holds TLS certificates on behalf of the customer and can see plaintext traffic. This is the standard CDN model and most customers accept it. For higher assurance, some configurations add mutual TLS between edge and origin to prevent anyone from bypassing the CDN and hitting the origin directly.
The CDN re-encrypts traffic to the origin over HTTPS using the origin's certificate. The internal backbone is not necessarily encrypted hop-to-hop, though most major providers encrypt it.
Failure modes and mitigations
PoP failure
A PoP's servers go offline — hardware, network partition, software bug. Anycast BGP withdraws the prefix; DNS health checks stop returning the PoP's IP. Users re-route to the next closest PoP. With BFD (Bidirectional Forwarding Detection) and pre-computed backup paths, optimized networks converge in seconds; untuned ISP paths can take several minutes. Users in-flight during the failure see connection resets and retry. CDN SLAs count this as degraded (higher latency), not unavailable, because content is still served from elsewhere.
Cache stampede on popular content
A hot object's TTL expires simultaneously across all edge servers in a PoP, and all of them try to fetch from upstream at once. Request coalescing (as described above) serializes these into one upstream request — only one server sends the fetch while the rest wait. An alternative is probabilistic early expiration: before the TTL expires, each incoming request independently decides with increasing probability whether to refresh the object early (the XFetch/exponential algorithm is the canonical implementation), so the object is refreshed before it goes cold rather than triggering a synchronized miss.
Stale content after failed purge
A purge API call is dropped, or a PoP is partitioned from the control plane during propagation. Some PoPs continue serving the old version. The mitigation is an acknowledgment protocol — the control plane retries until all PoPs confirm, or marks the PoP unhealthy. Short background TTLs for mutable content, and versioned URLs for assets where this is intolerable.
Hot object / traffic skew
One object (a viral video thumbnail, say) receives 10× expected traffic, overloading the cache servers that own it via consistent hashing. The fix is a small replication factor for hot objects: detect high-request-rate objects and replicate them to multiple servers within the PoP, widening the effective hash range. This trades some cache efficiency for hot-key resilience.
Origin overload
A cache miss storm — cold start, invalidation of a large object set, traffic spike from a new product launch — swamps the origin. Origin shield collapses all regional parents to one upstream point; request coalescing reduces concurrent fetches; a circuit breaker has the edge return stale or a synthetic error rather than hammering a degraded origin; and origin autoscaling triggered by shield health metrics can add capacity ahead of the storm.
Storage choices
| Layer | Storage medium | Sizing rationale |
|---|---|---|
| Edge cache server | NVMe SSD (20–40 TB) + DRAM hot tier (128–512 GB) | DRAM holds top 0.1% of objects; SSD holds the rest. NVMe gives < 100 µs seek. |
| Regional parent cache | SSD (100–400 TB per cluster) | Larger working set; must cover long-tail that misses at edge. |
| Origin shield | SSD or SSD-backed object store | Acts as last-mile cache; object set can be huge for large customers. |
| Origin | Any — S3, custom servers, app tier | Not CDN's concern; should be SSD-backed or object storage. |
Eviction at each layer uses LRU with frequency weighting (LRFU or TinyLFU-style) to prevent one-hit wonders from evicting hot objects. Large media objects are also evicted more aggressively since they consume disproportionate space per object.
Things to discuss in an interview
- Routing mechanism: be precise — anycast vs. DNS, when each applies, how failure is handled.
- Cache hierarchy: name the layers and explain why each one exists (hit ratio multiplication, origin protection).
- Consistent hashing: why it matters within a PoP; what goes wrong without it.
- Invalidation strategy: version keys as the baseline; purge API for mutable content; surrogate keys for bulk invalidation.
- Request coalescing: this is the answer to "thundering herd at the CDN layer" — don't skip it.
- Edge compute vs. dynamic acceleration: two different tools for dynamic content; know the difference.
- DDoS absorption as a side effect: large anycast surface area distributes volumetric attacks across all PoPs; the CDN's bandwidth (100 Tbps) dwarfs most attack volumes.
Things you should now be able to answer
- What is the difference between anycast and DNS-based CDN routing? When does each fail?
- Why does a cache hierarchy with an origin shield reduce origin load by more than a flat layer of edge caches?
- How does request coalescing prevent a cache stampede, and what is probabilistic early expiration?
- Why do CDNs use consistent hashing within a PoP, and what is the effect of adding a new cache server?
- What are the trade-offs between versioned URLs and the purge API for cache invalidation?
- How does TLS termination at the edge improve connection latency?
- What happens to user traffic when a PoP fails, and how long does re-routing take?
Further reading
- "An overview of the Cloudflare architecture" — blog.cloudflare.com
- "How does Anycast work?" — RIPE NCC Labs
- "Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web" — Karger, Lehman, Leighton, Levine, Lewin, Panigrahy (STOC 1997) — the foundational algorithm and original Akamai distributed caching model
- Consistent Hashing — Ironclad Academy module
- HTTP Caching — RFC 9111 (the current standard, updated from RFC 7234)
- "Introducing Cloudflare Workers" — blog.cloudflare.com — edge compute model
Frequently asked questions
▸What is the difference between anycast and DNS-based CDN routing?
Anycast has every PoP announce the same IP prefix via BGP, so the internet automatically steers packets to the topologically nearest PoP with no extra DNS lookup and automatic failover when a PoP withdraws its prefix. DNS-based routing has the CDN's authoritative DNS server return the IP of the nearest PoP based on the resolver's subnet, which allows fine-grained steering by latency measurement, country, or ISP but introduces a 60-120 second lag on failure detection due to DNS TTLs. Production CDNs combine both: anycast for network-layer routing and DNS for latency-aware tuning and canary traffic splits.
▸Why does an origin shield reduce origin load more than a flat layer of edge caches?
Without a shield, each regional parent independently fetches a cold object from the origin on a miss, so ten regional parents each generate a separate upstream request. With an origin shield, all regional parents converge on one node, which fires a single fetch to the origin. At a 95% edge hit ratio on 62.5 million requests per second, the shield boundary already sees only 3.1 million requests per second, and an 80% shield hit ratio cuts that to roughly 620,000 requests per second actually reaching origin.
▸How does request coalescing prevent a cache stampede?
When a popular object's TTL expires and thousands of concurrent requests all miss simultaneously, request coalescing has the edge lock the cache key and queue all subsequent requests for that object while sending exactly one upstream fetch. All queued requests are served from the single response when it arrives, so the upstream sees one request regardless of concurrent traffic volume.
▸When should versioned URLs be preferred over the purge API for cache invalidation?
Versioned URLs are the baseline for any asset whose URL you control, such as JS, CSS, and image files, because embedding a file hash or build ID in the filename means old and new content have different cache keys and no invalidation step is needed — TTLs of a year or more are common for immutable assets. The purge API is necessary for content with stable URLs that cannot be renamed, like HTML pages or API responses, and for emergency takedowns; a global purge fans out to 250 PoPs times 20 servers, roughly 5,000 RPC calls, and should complete within 1-5 seconds.
▸Why do CDNs use consistent hashing within a PoP, and what happens when a cache server is added?
Without consistent hashing, naive round-robin sends each URL to a random server so no server builds a useful cache for any given object, turning every server into a miss machine. Consistent hashing places both servers and URLs on a hash ring so each object always routes to the same server. Adding a server to a 20-node PoP displaces only the objects that hashed to that server's range, invalidating roughly 1 out of 21 (about 5%) of cached objects rather than causing a near-total cache flush.
You may also like
Design an LLM Observability Platform
Build the distributed tracing backbone for non-deterministic, multi-step LLM applications — capturing every prompt, completion, token count, and dollar cost across chains, retrievals, and tool calls so you can debug a failed agent run and account for every cent.
Design an LLM Gateway (AI Gateway & Model Router)
A single proxy control plane in front of OpenAI, Anthropic, Google, and open models — routing ~65 trillion tokens a month with automatic failover, semantic caching, per-team budget enforcement, and streaming SSE passthrough, all under 50 ms of added latency.
Design an LLM Fine-Tuning Platform
Turn a base model and a dataset into a deployed fine-tuned adapter at scale — the end-to-end platform covering dataset ingestion, LoRA/QLoRA/DPO training, fault-tolerant distributed GPU scheduling, eval gating, and multi-LoRA serving for hundreds of concurrent fine-tunes.