// TOPIC

#distributed-systems

31 articles

◆◆◆AdvancedMetaLinkedIn
01

Design a Social Graph Service (Facebook's TAO)

Serve billions of "who follows whom" reads over a graph of trillions of edges. The objects-and-associations model, a cache in front of sharded SQL, and the hot-vertex problem.

#interview#graphs#caching
22 min
◆◆◆AdvancedGoogleAuth0
02

Design an Authorization System (Google Zanzibar / RBAC / ReBAC)

Answer "can user U do action A on resource R?" globally, in milliseconds, consistently. RBAC vs ABAC vs ReBAC, Zanzibar relation tuples, and the new-enemy problem.

#interview#security#distributed-systems
23 min
◆◆◆AdvancedAmazonGoogle
03

Design a Distributed Job Scheduler (cron at scale)

Run millions of scheduled and recurring jobs reliably — at-least-once execution, leader election, sharded time-wheels, and exactly-once side effects via idempotency.

#interview#distributed-systems#scheduling
20 min
◆◆◆AdvancedGoogleUber
04

Design a Globally-Distributed SQL Database (Spanner / CockroachDB)

SQL transactions that are ACID across continents. How Spanner shards into Paxos groups, runs 2PC on top, and uses TrueTime to give you external consistency — the CP counterpart to Dynamo.

#interview#distributed-systems#databases
13 min
◆◆◆AdvancedAmazonGoogle
05

Design an Object Storage Service (S3)

Store arbitrary blobs with HTTP GET/PUT at exabyte scale and 11 nines of durability. Metadata vs data separation, erasure coding, and self-healing.

#interview#storage#distributed-systems
24 min
◆◆◆AdvancedStripeAmazon
06

Design a Payment System (Stripe-style)

Move money correctly. Double-entry ledgers, idempotency keys, the authorize/capture/settle lifecycle, reconciliation, and why money never gets eventual consistency.

#interview#payments#consistency
20 min
◆◆◆AdvancedAmazonLinkedIn
07

Design a Distributed Message Queue (Kafka)

Build a durable, partitioned, replicated commit log like Kafka — ordering, consumer groups, replication (ISR), and exactly-once.

#interview#messaging#distributed-systems
21 min
◆◆◆AdvancedElasticAmazon
08

Design a Distributed Search Engine (Elasticsearch)

Index billions of documents and answer full-text queries in milliseconds. Inverted indexes, sharding + replication, scatter-gather, and relevance scoring.

#interview#search#indexing
21 min
◆◆◆AdvancedGoogleAmazon
09

Design a Distributed Lock / Coordination Service (ZooKeeper / etcd)

Provide mutual exclusion and coordination across machines safely. Consensus-backed locks, leases, fencing tokens, and why a lock without fencing is unsafe.

#interview#distributed-systems#consensus
21 min
◆◆IntermediateNetflixLinkedIn
10

Backpressure & Flow Control

What happens when a fast producer overwhelms a slow consumer? Backpressure, bounded buffers, load shedding, and why unbounded queues are a trap.

#distributed-systems#reliability#streaming
17 min
◆◆◆AdvancedAmazonNetflix
11

Quorums, Read-Repair & Anti-Entropy (Dynamo-style)

How leaderless databases like Dynamo and Cassandra stay available and converge. Quorum R+W>N, read-repair, hinted handoff, Merkle anti-entropy, and conflict resolution.

#distributed-systems#consistency#replication
18 min
◆◆IntermediateStripeAmazon
12

Idempotency & Exactly-Once Semantics

Networks retry, so your operations will run twice. Idempotency keys, dedup, and why "exactly-once delivery" is a myth but "exactly-once effect" is achievable.

#distributed-systems#reliability#messaging
18 min
◆◆◆AdvancedAmazonMicrosoft
13

Event Sourcing & CQRS

Store every change as an immutable event and rebuild state by replay. Event sourcing, CQRS read models, snapshots, and the trade-offs nobody warns you about.

#distributed-systems#architecture#data-modeling
17 min
◆◆◆AdvancedGoogleMeta
14

Design a Distributed File System (GFS / HDFS)

Store petabyte files across thousands of commodity machines for high-throughput batch reads. The single-master + chunkservers design, replication, and append-heavy workloads.

#interview#storage#distributed-systems
23 min
◆◆IntermediateConfluentNetflix
15

Change Data Capture (CDC) & the Outbox Pattern

Turn your database write log into a reliable event stream. Log-based CDC, the dual-write problem, and the transactional outbox.

#distributed-systems#data-pipelines#messaging
15 min
◆◆IntermediateLaunchDarklyStatsig
16

Designing a Feature Flag Service

An in-house LaunchDarkly. Distributing config with sub-100ms freshness to thousands of services, targeting rules, and the safety properties that prevent a flag flip from taking the site down.

#case-study#configuration#distributed-systems
19 min
◆◆◆AdvancedAmazonUber
17

The Saga Pattern & Distributed Transactions

How do you keep data consistent across services with no shared database? Sagas, compensating transactions, orchestration vs choreography, and why 2PC fails at scale.

#distributed-systems#transactions#microservices
17 min
◆◆◆AdvancedGoogleEtcd
18

Leader Election and Consensus (Raft, Paxos)

How distributed systems agree on a single leader without splitting brains. Raft step-by-step, Paxos explained intuitively, and where consensus shows up in production.

#distributed-systems#consensus#raft
13 min
◆◆IntermediatePostgresCassandra
19

Database Replication

Single-leader, multi-leader, and leaderless replication. Sync vs async, replication lag, conflict resolution, and how each model trades availability for consistency.

#databases#distributed-systems#replication
14 min
◆◆◆AdvancedAmazonMeta
20

Design a Distributed Key-Value Store (Dynamo)

Build your own DynamoDB / Cassandra. Sharding, replication, quorum reads/writes, vector clocks, conflict resolution.

#interview#distributed-systems#storage
19 min
◆◆IntermediateTwitterDiscord
21

Design a Distributed Unique ID Generator

How Twitter/Discord/Instagram generate billions of unique IDs per day with no central coordinator. UUIDs, snowflake, ULIDs.

#interview#distributed-systems#identifiers
16 min
◆◆IntermediateGoogleYelp
22

Design Yelp / Nearby Search (proximity service)

Find restaurants/businesses near a location, fast. Geohash, quadtree, hexagonal cells, and the right index for "within 5 km of me".

#interview#geo#search
16 min
◆◆◆AdvancedGoogleDropbox
23

Design Google Drive / Dropbox

File sync that works on every device, blob storage, deduplication, conflict resolution, and how to do all of it efficiently.

#interview#storage#sync
17 min
◆◆◆AdvancedMetaAmazon
24

Design a Distributed Cache (like Memcached)

A cache that scales across hundreds of nodes — consistent hashing, replication, eviction, and the operational problems you'll meet.

#interview#caching#distributed-systems
18 min
◆◆◆AdvancedGoogleMicrosoft
25

Design a Web Crawler (Googlebot)

Crawl billions of URLs at petabyte scale. URL frontier, politeness, deduplication, 4xx/dead-link handling, and the realities of indexing the web.

#interview#crawling#distributed-systems
17 min
◆◆IntermediateStripeCloudflare
26

Design a Rate Limiter

Token bucket, leaky bucket, fixed window, sliding window — and the distributed Redis-based limiter you can copy into production.

#interview#rate-limiting#distributed-systems
14 min
◆◆IntermediateGoogleAmazon
27

CAP Theorem Deep Dive

The CAP theorem, debunked myths, PACELC, and the actual trade-offs every distributed database makes.

#distributed-systems#consistency#theory
10 min
◆◆IntermediateMetaGoogle
28

Database Sharding

When you outgrow a single database — how to split data across many machines, the strategies that work, and the operational pain you'll inherit.

#databases#sharding#scalability
12 min
◆◆IntermediateAmazonMeta
29

Consistent Hashing

Why hash-mod-N breaks when you resize, and how Amazon Dynamo, Cassandra, and Memcached avoid it with consistent hashing and virtual nodes.

#distributed-systems#hashing#sharding
9 min
◆◆Intermediate
30

CAP, Consistency, and Replication

CAP and PACELC, consistency models from linearizable to eventual, replication strategies, quorums, partitioning, consensus (Raft, Paxos), CRDTs, and 2PC.

#distributed-systems#consistency#replication
16 min
◆◆Intermediate
31

Reliability and Failure Patterns

Timeouts, retries with backoff, circuit breakers, bulkheads, deadlines, hedged requests, and graceful degradation — the patterns that keep distributed systems standing.

#reliability#resilience#distributed-systems
15 min