// TOPIC

#caching

15 articles

Design an LLM Gateway (AI Gateway & Model Router)

A single proxy control plane in front of OpenAI, Anthropic, Google, and open models — routing ~65 trillion tokens a month with automatic failover, semantic caching, per-team budget enforcement, and streaming SSE passthrough, all under 50 ms of added latency.

#interview#ai#llm

31 min

◆◆◆AdvancedUberAirbnb

Design a Feature Store

Serve the exact same feature values to model training and online inference — eliminating training-serving skew — across batch, streaming, and on-demand tiers at sub-10ms latency and millions of reads per second. The architecture powering Uber Michelangelo, Airbnb Chronon, and DoorDash Gigascale.

#interview#ai#mlops

26 min

◆◆◆AdvancedOpenAIAnthropic

Design an LLM Inference & Serving System

Serve token generation for a 70B-parameter model at scale — where KV cache, not FLOPs, caps concurrency and continuous batching is what separates good GPU utilization from terrible utilization.

#interview#ai#llm

25 min

◆◆◆AdvancedMetaLinkedIn

Design a Social Graph Service (Facebook's TAO)

Serve billions of "who follows whom" reads over a graph of trillions of edges. The objects-and-associations model, a cache in front of sharded SQL, and the hot-vertex problem.

#interview#graphs#caching

22 min

◆◆IntermediateMetaYouTube

Design a Distributed Counter (view / like counts)

Count likes and views at millions of increments per second without a single hot row melting. Sharded counters, write batching, and approximate vs exact counts.

#interview#scale#consistency

19 min

◆◆IntermediateRiot GamesActivision

Design a Real-Time Leaderboard (gaming)

Rank millions of players by score and answer "top N" and "my rank" instantly. Redis sorted sets, sharding by score range, and approximate ranks at scale.

#interview#ranking#caching

22 min

◆◆◆AdvancedCloudflareAkamai

Design a Content Delivery Network (CDN)

Serve content from the edge, close to users, at massive scale. Request routing (anycast & DNS), cache hierarchies, invalidation, and origin shielding.

#interview#caching#networking

21 min

◆◆IntermediateGoogleCassandra

Bloom Filters

A tiny, probabilistic data structure that says "definitely not" or "maybe" — and saves billions of disk reads. The math, the tuning, and where every big system uses one.

#data-structures#probabilistic#caching

15 min

◆◆◆AdvancedMetaAmazon

Design a Distributed Cache (like Memcached)

A cache that scales across hundreds of nodes — consistent hashing, replication, eviction, and the operational problems you'll meet.

#interview#caching#distributed-systems

18 min

◆◆IntermediateGoogleAmazon

Design Search Autocomplete (Typeahead)

Sub-100ms autocomplete suggestions across billions of queries — tries, top-k caching, and personalized ranking.

#interview#search#trie

15 min

◆◆◆AdvancedMetaInstagram

Design a News Feed (Facebook / Instagram)

How Meta builds the home feed — fanout, ranking, candidate generation, and how the same architecture serves billions.

#interview#feed#fanout

16 min

◆◆◆AdvancedMetaTwitter

Design Twitter / X (the home timeline)

500M users, 500M tweets/day, p99 feed loads under 200ms. The fanout-on-write vs fanout-on-read trade-off that defines the system.

#interview#feed#fanout

16 min

◆BeginnerGoogleAmazon

Design a URL Shortener (TinyURL / bit.ly)

A classic FAANG warmup. Generate short codes, store them, redirect fast, scale to billions of URLs.

#interview#hashing#caching

13 min

◆BeginnerMetaGoogle

Scale From Zero to Millions of Users

The classic walkthrough — start with one server, add a load balancer, add caching, replicate the database, shard, geo-distribute. Every transition explained.

#scalability#architecture#fundamentals

14 min

◆Beginner

Caching

Cache hierarchies, write strategies, eviction policies, Redis data structures, the four classic cache pathologies, and how to size and warm a cache properly.

#caching#performance#redis

18 min