// TOPIC

#caching

15 articles

◆◆◆AdvancedCloudflareOpenAI
01

Design an LLM Gateway (AI Gateway & Model Router)

A single proxy control plane in front of OpenAI, Anthropic, Google, and open models — routing ~65 trillion tokens a month with automatic failover, semantic caching, per-team budget enforcement, and streaming SSE passthrough, all under 50 ms of added latency.

#interview#ai#llm
31 min
◆◆◆AdvancedUberAirbnb
02

Design a Feature Store

Serve the exact same feature values to model training and online inference — eliminating training-serving skew — across batch, streaming, and on-demand tiers at sub-10ms latency and millions of reads per second. The architecture powering Uber Michelangelo, Airbnb Chronon, and DoorDash Gigascale.

#interview#ai#mlops
26 min
◆◆◆AdvancedOpenAIAnthropic
03

Design an LLM Inference & Serving System

Serve token generation for a 70B-parameter model at scale — where KV cache, not FLOPs, caps concurrency and continuous batching is what separates good GPU utilization from terrible utilization.

#interview#ai#llm
25 min
◆◆◆AdvancedMetaLinkedIn
04

Design a Social Graph Service (Facebook's TAO)

Serve billions of "who follows whom" reads over a graph of trillions of edges. The objects-and-associations model, a cache in front of sharded SQL, and the hot-vertex problem.

#interview#graphs#caching
22 min
◆◆IntermediateMetaYouTube
05

Design a Distributed Counter (view / like counts)

Count likes and views at millions of increments per second without a single hot row melting. Sharded counters, write batching, and approximate vs exact counts.

#interview#scale#consistency
19 min
◆◆IntermediateRiot GamesActivision
06

Design a Real-Time Leaderboard (gaming)

Rank millions of players by score and answer "top N" and "my rank" instantly. Redis sorted sets, sharding by score range, and approximate ranks at scale.

#interview#ranking#caching
22 min
◆◆◆AdvancedCloudflareAkamai
07

Design a Content Delivery Network (CDN)

Serve content from the edge, close to users, at massive scale. Request routing (anycast & DNS), cache hierarchies, invalidation, and origin shielding.

#interview#caching#networking
21 min
◆◆IntermediateGoogleCassandra
08

Bloom Filters

A tiny, probabilistic data structure that says "definitely not" or "maybe" — and saves billions of disk reads. The math, the tuning, and where every big system uses one.

#data-structures#probabilistic#caching
15 min
◆◆◆AdvancedMetaAmazon
09

Design a Distributed Cache (like Memcached)

A cache that scales across hundreds of nodes — consistent hashing, replication, eviction, and the operational problems you'll meet.

#interview#caching#distributed-systems
18 min
◆◆IntermediateGoogleAmazon
10

Design Search Autocomplete (Typeahead)

Sub-100ms autocomplete suggestions across billions of queries — tries, top-k caching, and personalized ranking.

#interview#search#trie
15 min
◆◆◆AdvancedMetaInstagram
11

Design a News Feed (Facebook / Instagram)

How Meta builds the home feed — fanout, ranking, candidate generation, and how the same architecture serves billions.

#interview#feed#fanout
16 min
◆◆◆AdvancedMetaTwitter
12

Design Twitter / X (the home timeline)

500M users, 500M tweets/day, p99 feed loads under 200ms. The fanout-on-write vs fanout-on-read trade-off that defines the system.

#interview#feed#fanout
16 min
BeginnerGoogleAmazon
13

Design a URL Shortener (TinyURL / bit.ly)

A classic FAANG warmup. Generate short codes, store them, redirect fast, scale to billions of URLs.

#interview#hashing#caching
13 min
BeginnerMetaGoogle
14

Scale From Zero to Millions of Users

The classic walkthrough — start with one server, add a load balancer, add caching, replicate the database, shard, geo-distribute. Every transition explained.

#scalability#architecture#fundamentals
14 min
Beginner
15

Caching

Cache hierarchies, write strategies, eviction policies, Redis data structures, the four classic cache pathologies, and how to size and warm a cache properly.

#caching#performance#redis
18 min