What Is System Design?
Start here. What system design actually means, why it matters, the framework that turns a vague problem into a defensible architecture, and how to read every other article in this course.
System design is the process of defining the architecture, components, interfaces, and data flow of a software system to meet a set of requirements. Concretely: how do you turn "let's build Twitter" into something a team can build, that stays up at 3am, and that doesn't cost $4M/month to run?
It is not writing code. Design is the layer of decisions that comes before code — which database, how many services, how they talk, what happens when one of them dies, what happens when a million users show up at once, and where you'll regret your choices in two years.
If coding is how, system design is what and why.
Why this matters
Climbing past mid-level at any large tech company means taking a vague problem — "we need real-time notifications" — and turning it into an architecture that scales, fails gracefully, and stays affordable. The design is the work; writing code is what happens once the design is settled. That's the first reason.
The second is purely practical: every senior+ role at Meta, Google, Amazon, Microsoft, Netflix, Stripe, and Uber includes a 45- to 60-minute system design round. You're graded on the same dimensions you'd be graded on in the job — clarity of communication, depth of trade-off thinking, and breadth of vocabulary.
The third is less obvious: knowing how the systems around your code work makes you better at the code itself. You'll understand why your team chose Postgres over DynamoDB, why there's a Kafka in the middle, and why retries need jitter — and that understanding will keep you from causing the next outage.
What design actually is
Junior engineers tend to think of design as drawing boxes. It isn't. Drawing boxes is capturing design. The doing of design is making a series of small, testable decisions, each of which closes off a class of options.
flowchart LR
A[Vague idea<br/>'design Twitter'] --> B[Constraints]
B --> C[Decisions]
C --> D[Architecture]
D --> E[Trade-offs<br/>made explicit]
style A fill:#ff6b1a,stroke:#ff6b1a,color:#0a0a0f
style E fill:#a855f7,stroke:#a855f7,color:#fff
A real system designer's mind moves like this:
"Read-heavy social timeline → 100:1 read/write skew → reads must be cheap → fanout-on-write → write amplification → cap follower count for celebrity accounts → fallback to fanout-on-read for celebrities → now I have two paths through the system; let me write that down."
Each phrase is a decision that constrains the next one. The architecture diagram is the result; the work was the chain of reasoning.
The framework: from vague request to defensible design
Every system design problem — interview or real — has the same skeleton. Internalize it once and you'll never freeze again.
flowchart TD
R[Requirements] --> E[Estimation]
E --> H[High-Level Design]
H --> D[Deep Dives]
D --> T[Trade-offs & Wrap-up]
style R fill:#ff6b1a,stroke:#ff6b1a,color:#0a0a0f
style T fill:#a855f7,stroke:#a855f7,color:#fff
In a 45-minute interview, the rough budget is:
| Phase | Time | What you produce |
|---|---|---|
| Requirements | 5 min | Bulleted functional + non-functional requirements |
| Estimation | 5 min | QPS, storage, bandwidth |
| High-level design | 10 min | API + data model + box-and-arrow diagram |
| Deep dive | 15 min | Detailed look at 1–2 hot spots |
| Trade-offs | 5 min | What you'd change with more time |
| Q&A | 5 min | Reserved |
The same pattern applies to real projects — you just spend hours per phase instead of minutes.
sequenceDiagram
participant You
participant Interviewer
You->>Interviewer: Clarify requirements (5 min)
Interviewer-->>You: Scope confirmed
You->>Interviewer: Sketch estimation (5 min)
Interviewer-->>You: Numbers agreed
You->>Interviewer: High-level design (10 min)
Interviewer-->>You: Questions, probes
You->>Interviewer: Deep dive on hard part (15 min)
Interviewer-->>You: Follow-ups
You->>Interviewer: Name trade-offs, wrap up (5 min)
Interviewer-->>You: Q&A (5 min)
Phase 1: Requirements
Two kinds, and most candidates under-invest in the second one.
Functional requirements are what the system does:
"Users can post tweets up to 280 characters. Followers see new tweets in their feed. Tweets can have images. Users can follow / unfollow. Users can reply, like, retweet."
Non-functional requirements are how well it does it:
"p99 latency of feed load < 200ms. Tweets durable forever. 99.99% availability. 300M daily active users. 100:1 read/write ratio."
Non-functional requirements are what actually shape the architecture. "1k DAU" and "300M DAU" require completely different systems even if the functional list is identical.
Always ask:
- Scale — DAU? Peak QPS? Storage growth?
- Latency — p50, p99 budgets?
- Availability — three nines or five?
- Durability — can we lose a write?
- Consistency — read-your-writes? Strong? Eventual?
- Geography — single region or global?
- Budget — are we cost-constrained? (Almost always; rarely admitted.)
If your interviewer hand-waves, propose numbers and get agreement. "Let's say 100M DAU, p99 latency 200ms, eventually consistent reads — sound right?" Now you have something to design against.
Phase 2: Estimation (the back-of-the-envelope)
How big? How fast? How much?
These rough numbers tell you whether you need one database or one thousand. Whether a cache is optional or mandatory. Whether your problem is bandwidth, storage, compute, or coordination.
A flavor:
- 300M DAU × 5 tweets/day = 1.5B tweets/day ≈ 17k tweets/sec average, ~50k/sec peak (assume ~3× surge).
- Tweet ≈ 1 KB → 1.5 TB/day → ~550 TB/year of text; media is the real cost.
- Read:write 100:1 → ~1.7M reads/sec average, ~5M reads/sec peak.
This phase has its own whole module — it's so important.
Phase 3: High-level design
Three artifacts.
API design (signatures only). What does a client call? What goes in, what comes out?
POST /tweets body: {text, media?} → {tweet_id}
GET /timeline?cursor=... → {tweets[], next_cursor}
POST /follows body: {target_user_id} → 204
Data model. What tables / collections exist; what are the keys?
users(user_id PK, username, ...)
tweets(tweet_id PK, author_id FK, text, created_at)
follows(follower_id, followee_id) PK(follower_id, followee_id)
timeline_cache(user_id, tweet_id, score) ← Redis sorted set
Architecture diagram. Boxes and arrows. Each box is a service or data store; each arrow has a direction and a protocol.
flowchart LR
C[Client] --> CDN
CDN --> LB[Load Balancer]
LB --> API[API service]
API --> CACHE[(Redis)]
API --> DB[(Postgres)]
API --> KAFKA[Kafka]
KAFKA --> FW[Fanout workers]
FW --> TLC[(Timeline cache)]
style API fill:#ff6b1a,color:#0a0a0f
style KAFKA fill:#0e7490,color:#fff
Don't draw 30 boxes. Five to ten is the sweet spot. Boxes should map to your nouns from the requirements.
Phase 4: Deep dive
Pick the hard part and go deep. The interviewer is usually itching to dig in here. Common deep-dive areas:
- The read path under load (how does feed work for a celebrity?).
- The write path under load (write amplification, fanout).
- Storage layout (sharding key, hot partitions).
- Consistency model (what guarantees do users see?).
- Failure handling (what happens when X dies?).
For Twitter, the deep dive is almost always fanout (write to many follower timelines vs read from many timelines), with a digression into celebrity accounts.
This is where the difference between a junior and senior answer shows. Junior: "we'd use Cassandra." Senior: "Cassandra for tweets keyed by author_id with time as the cluster key, replication factor 3 across AZs, but tweets also go to a Kafka topic so the fanout workers can write into per-follower Redis sorted sets — and for accounts above ~10k followers we skip the precomputed fanout and merge at read time, accepting the higher read cost."
Phase 5: Trade-offs and wrap-up
Every decision you made closed off alternatives. Name them.
"We chose fanout-on-write because read latency dominates. The cost is write amplification: a tweet from someone with 10k followers becomes 10k cache writes; from someone with 10M followers it becomes 10M. We mitigate that with the celebrity exception — above ~10k followers we skip the precompute and merge at read time. If we had stricter durability requirements, we'd need synchronous replication on the timeline cache, which would push p99 higher — we'd accept that for finance, not for social."
A senior signal in interviews is knowing what you didn't choose, and why.
What good answers sound like
Three traits separate strong answers from weak ones, and none of them are about knowing the right buzzwords.
Use numbers
"We'll need a lot of capacity" leaves the interviewer with nothing to probe. "At 50k writes/sec peak, a single Postgres caps out — we shard by user_id with ~50 shards giving us headroom to ~2.5M writes/sec" gives them something to push back on, extend, or validate. Numbers force a specific conversation. They also expose your math — interviewers respect the candidate who notices their own off-by-1000.
Use trade-off language
"We should use Cassandra" is a preference. "We'd choose Cassandra over Postgres here because we need linear write scalability — but we accept eventual consistency for the timeline, which is fine because users tolerate a few seconds of lag on a follow" is a design decision. Every architecture choice is a trade-off; state it as one.
Evolve the design
Laying out a perfect architecture in one go is a red flag — it suggests you memorized an answer rather than reasoned to one. Start with the simplest thing that could work; identify the bottleneck; evolve.
"Let's start with a single Postgres. At 50k writes/sec peak it falls over — so we shard. At high follower counts the writes amplify badly — so we add a Kafka stream and fanout workers. At celebrity scale, we skip the precompute entirely. Each step is justified by the previous one's bottleneck."
This narrative is hard to fake, and it's exactly how real systems get built.
Common pitfalls
flowchart TD
P[Common mistakes] --> P1[Skipping requirements]
P --> P2[Skipping estimation]
P --> P3[Buzzword soup]
P --> P4[Naming a tool, not a design]
P --> P5[No trade-offs]
P --> P6[Drawing 30 boxes]
style P fill:#ff2e88,color:#fff
Skipping requirements. "Let me start drawing." You'll redo the design when you discover the read pattern is 100:1.
Buzzword soup. "We use Kafka, Spark, Redis, Cassandra, Vault, Consul, Envoy, gRPC, GraphQL, and Postgres." If you can't justify each piece, take it out.
Naming a tool when you mean a design. "We use DynamoDB" is not a design. "We use a partitioned key-value store with (user_id, tweet_id) as the composite key, 100k WCU provisioned with auto-scaling enabled for traffic spikes" is a design.
Treating one path as the whole system. A tweet doesn't just write to a database. It writes to the database, fans out to follower caches, indexes for search, triggers notifications, updates view counts. Each one of these has its own latency / consistency / failure model.
No trade-offs. Every decision sounds optimal in your answer. The interviewer knows that's not how it works. Tell them what you traded away.
A worked example, end to end
Let's compress a real interview problem into one panel: design a URL shortener.
Requirements
Functional: shorten a URL, redirect on click, custom aliases optional, analytics optional. Non-functional: 100M new URLs/month, 10:1 read:write, 99.9% availability, p99 redirect < 50ms.
Estimation
Writes: 100M / (30 × 86,400) ≈ 40/sec average, ~120/sec peak
Reads: ~400/sec average, ~1,200/sec peak (10:1 read:write)
Storage: 100M × 60 months × ~500B = 3 TB total — fits on one big Postgres
Cache: ~1M hot URLs × 500B ≈ 500 MB — one Redis node fits comfortably
(20% of URLs typically handle 80% of traffic; cache that 20%)
Already we know this is a small system, not a "needs Kafka and Cassandra" system.
High-level design
flowchart LR
U[User] --> CDN
CDN --> API[API + redirect service]
API --> RDS[(Redis cache)]
API --> PG[(Postgres)]
PG -.async.-> AN[(Analytics store)]
style API fill:#ff6b1a,color:#0a0a0f
style PG fill:#0e7490,color:#fff
API:
POST /shorten body: {url, alias?} → {short_code}
GET /:code → 302 redirect to original_url
Deep dive — short code generation
Three options:
- Hash the URL, take 7 chars — fast, but collisions need explicit handling.
- Auto-incrementing ID + base62 encode — no collisions, predictable; reveals total URL count (minor privacy leak).
- Pre-allocated random codes from a pool — avoids sequential guessing, but needs a background job to keep the pool stocked.
We'd pick (2) for simplicity, accepting the minor leak. Option (3) is worth it only if obscuring the URL count matters.
Trade-offs
- We chose Postgres over DynamoDB because at this scale Postgres is simpler and cheaper.
- We pre-cache hot codes; cold ones go to Postgres directly. Acceptable because cold reads are rare.
- Analytics is async — a few-minute lag is fine.
- If write volume grew 100×, we'd shard or rebuild on a KV store. Not now.
That's a complete (if compressed) system design answer. The full version is in the URL shortener article.
What this course covers
By the end of these twelve modules, you'll have the whole vocabulary:
| # | Topic | Why it matters |
|---|---|---|
| 01 | This module | Mental model + framework |
| 02 | Back-of-the-envelope estimation | Sizing systems |
| 03 | Networking & HTTP | The transport layer |
| 04 | APIs & protocols | How services talk |
| 05 | Databases | Where data lives |
| 06 | Storage systems | Where the bytes live |
| 07 | Caching | Where data lives faster |
| 08 | Load balancers & proxies | Spreading load |
| 09 | Message queues & streams | Async glue |
| 10 | CAP, consistency, replication | Distributed systems theory |
| 11 | Reliability patterns | Surviving failure |
| 12 | Observability | Seeing what your system is doing |
Once you've finished, you can read any of the deep-dive articles and any of the FAANG interview problems with full understanding.
How to read this course
TL;DR: Read in order. Don't skip module 02 — it ties everything else together.
Each module:
- Builds on the previous one.
- Has diagrams. Pause and look at them.
- Ends with a "things you should now be able to answer" checklist. Do that before moving on.
If you're prepping for an interview, the highest-leverage modules are: 02 (estimation), 05 (databases), 07 (caching), 09 (queues), and 10 (CAP).
If you're trying to do the work in production, 11 (reliability) and 12 (observability) will save you more sleep than any of the others.
What you'll be able to answer after this course
A taste:
- Why is a 99th-percentile latency of 100ms hard to achieve?
- Why does Postgres get slow at 500 GB and what do you do about it?
- What's the difference between Redis as a cache and Redis as a database?
- When does a system need a message queue vs a direct API call?
- Why can't you have CAP all three — and what does that even mean?
- A downstream service is slow; why does that take your service down?
- How do you "exactly-once" anything in a distributed system?
Let's go.
→ Next: Back-of-the-envelope estimation
Frequently asked questions
▸What is system design?
System design is the process of defining the architecture, components, interfaces, and data flow of a software system to meet a set of requirements. It is the layer of decisions that comes before code — which database, how many services, how they talk, what happens when one dies, and what happens when a million users show up at once.
▸What is the difference between functional and non-functional requirements, and which one drives architecture?
Functional requirements describe what the system does (e.g., users can post tweets up to 280 characters), while non-functional requirements describe how well it does it (e.g., p99 latency of feed load under 200ms, 99.99% availability, 300M daily active users). Non-functional requirements are what actually shape the architecture — 1k DAU and 300M DAU demand completely different systems even with an identical functional list.
▸How long should each phase take in a 45-minute system design interview?
The article prescribes: 5 minutes on requirements, 5 minutes on estimation, 10 minutes on high-level design, 15 minutes on the deep dive, 5 minutes on trade-offs, and 5 minutes for Q&A. The same six-phase skeleton applies to real projects — you just spend hours per phase instead of minutes.
▸What is fanout-on-write and when should you skip it?
Fanout-on-write means pushing a new tweet into every follower's precomputed timeline cache at write time, which makes reads cheap but amplifies writes. The article recommends skipping the precompute for accounts above roughly 10k followers — so-called celebrity accounts — and instead merging their tweets at read time to avoid the write amplification that would otherwise reach 10 million cache writes for a single post.
▸How many boxes should an architecture diagram have?
Five to ten boxes is the sweet spot. Each box should map to a noun from the requirements. Drawing 30 boxes is listed as a common pitfall that obscures rather than communicates design.