#storage
12 articles
Design an Object Storage Service (S3)
Store arbitrary blobs with HTTP GET/PUT at exabyte scale and 11 nines of durability. Metadata vs data separation, erasure coding, and self-healing.
Design a Distributed Message Queue (Kafka)
Build a durable, partitioned, replicated commit log like Kafka — ordering, consumer groups, replication (ISR), and exactly-once.
Design an Email Service (Gmail)
Send, receive, store, and search email for hundreds of millions of users. SMTP ingestion, sharded mailbox storage, full-text search, and spam filtering.
Storage Engines: LSM-Trees vs B-Trees
Why does Postgres read fast and Cassandra write fast? The two storage-engine families that underpin every database — and their write/read/space amplification trade-offs.
Design a Distributed File System (GFS / HDFS)
Store petabyte files across thousands of commodity machines for high-throughput batch reads. The single-master + chunkservers design, replication, and append-heavy workloads.
Designing Pastebin
A simple service for sharing text snippets — and a surprisingly rich design problem. Storage strategy, expiry, syntax highlighting, abuse prevention.
Design a Distributed Key-Value Store (Dynamo)
Build your own DynamoDB / Cassandra. Sharding, replication, quorum reads/writes, vector clocks, conflict resolution.
Design Instagram (photo sharing)
A photo-first social network — uploads, image processing, feed, stories, and global delivery.
Design Google Drive / Dropbox
File sync that works on every device, blob storage, deduplication, conflict resolution, and how to do all of it efficiently.
Design a Web Crawler (Googlebot)
Crawl billions of URLs at petabyte scale. URL frontier, politeness, deduplication, 4xx/dead-link handling, and the realities of indexing the web.
Design YouTube / Netflix (video streaming)
How a billion users watch ~1 billion hours of video a day. Upload pipeline, transcoding, adaptive bitrate, CDN, recommendation.
Storage Systems — Files, Blocks, and Objects
Block storage, file systems, object storage (S3), CDNs, and the cost/durability/throughput trade-offs that decide where your data lives.