How to Approach a System Design Interview
A repeatable 45-minute framework — requirements, estimation, high-level design, deep dives, bottlenecks — and the moves that separate a hire from a no-hire.
A system design interview is not a test of how many buzzwords you can fit in 45 minutes. It's a test of whether you can take a fuzzy real-world problem, pin down what actually matters, and walk through a credible design while a stranger interrupts you with questions. Senior interviewers can tell the difference between "I read a blog yesterday" and "I have built this" inside the first ten minutes.
This article is the framework I use to keep the conversation on rails. It works across every flavor of question — design Twitter, design a rate limiter, design a payment system — because the process is the same. What changes is which deep-dive you reach for at minute 25.
The 45-minute clock
These are not strict — interviewers will pull you forward and backward — but if you've never said the word "estimation" by minute 15, something has gone wrong.
Step 1: Requirements (the most-skipped step)
The single biggest failure mode in this interview is jumping to a whiteboard before you understand what you're building. Slow down. Force the scope conversation.
Ask two kinds of questions:
Functional: what should the system do?
- "Should users be able to upload, or only view?"
- "Do we need search? Filtering? Recommendations?"
- "Is this read-heavy or write-heavy?"
- "Do messages need to be ordered? Globally or per-conversation?"
Non-functional: how well should it do it?
- "What scale are we designing for? 1k users? 1M? 1B?"
- "What's the latency target? p99 under 200ms?"
- "How available? Three nines? Five?"
- "How consistent? Eventual is OK for a feed, not OK for a wallet."
Write the answers down on the board in a corner. They are your contract for the next 40 minutes. Every design decision will refer back to them.
flowchart TD
R[Requirements board] --> F[Functional<br/>upload, view, comment, search]
R --> NF[Non-functional<br/>100M DAU, 100ms p99,<br/>99.99% available, eventual OK]
R --> OS[Out of scope<br/>analytics, admin panel,<br/>billing]
style R fill:#ff6b1a,color:#0a0a0f
The "out of scope" pile is just as important. Interviewers respect a candidate who says "we're not going to design moderation today, but flag it for later" — it shows judgment.
Step 2: Estimation (back-of-the-envelope)
Estimation is where you justify every later decision. You don't need to be exact. You need to be within an order of magnitude and able to defend the numbers.
Cover the four big quantities:
| Quantity | Why it matters |
|---|---|
| QPS (reads & writes) | Sizes your servers and load balancers |
| Storage | Picks your database; bytes/year cost |
| Bandwidth | Egress costs; CDN strategy |
| Memory | Cache sizing |
Worked example — designing Twitter (hypothetical teaching scale; actual reported peak was ~400M MAU):
- 500M MAU × 30% DAU = 150M DAU
- 150M × 1 tweet/user/day = 150M tweets/day ≈ 1.7k tweets/sec write
- Reads ~100× writes → 170k reads/sec
- Each tweet ~300 bytes → 150M × 300B ≈ 45 GB/day, ~16 TB/year of just tweet text
- Add media at 10× → 160 TB/year
Now every later decision points back to these. "We need a CDN because we're moving 160 TB of content." "Cache the timeline because reads are 100× writes."
There's a whole crash-course module on this. Memorize the standard anchors (1 KB typical row, ~500 B typical HTTP request, single Postgres primary comfortable up to ~5–10k writes/sec, Redis at ~100k+ ops/sec, commodity HTTP tier at ~20–50k req/sec for lightweight handlers) so you don't burn 5 minutes on arithmetic. The right number depends on request complexity — state a workload assumption when you cite any box-level throughput figure.
Step 3: High-level design
Now — and only now — go to the whiteboard. Draw the boxes and arrows version of the system.
A good high-level design has 4–8 boxes. More than that and you're already in the weeds. Less than that and you haven't said anything.
flowchart LR
U[Clients] --> CDN
CDN --> LB[Load Balancer]
LB --> APP[App Servers]
APP --> CACHE[(Redis Cache)]
APP --> DB[(Primary DB)]
APP --> Q[Queue]
Q --> WK[Workers]
WK --> DB
style APP fill:#ff6b1a,color:#0a0a0f
style CACHE fill:#15803d,color:#fff
Talk through it: "Clients hit the CDN for static assets. Dynamic requests go through a load balancer to a fleet of stateless app servers. Reads check Redis first, then the DB. Writes go to the DB and emit events to a queue for async work."
That's the shape. The interviewer will pick the box that interests them and ask you to expand it.
Step 4: Deep dives — where the interview is actually won
This is where most of your time goes, and where seniority is judged. The interviewer will push you into one or two specific areas. Common deep-dive prompts:
| Prompt | What they're testing |
|---|---|
| "How does your database handle this many writes?" | Sharding, replication, choice of DB |
| "What if the cache goes down?" | Failure modes, thundering herd, fallbacks |
| "How would you scale to 10× the load?" | Bottleneck analysis, horizontal scaling |
| "What if two users edit at once?" | Concurrency, locking, conflict resolution |
| "How do you ensure messages aren't lost?" | Delivery semantics, idempotency, dead-letter queues |
| "How would you make this multi-region?" | Replication lag, CAP, consistency models |
Strong answers have a before/after structure: acknowledge the issue ("right, with 100k writes/sec, a single Postgres will not keep up"), sketch two or three alternatives ("we could shard by user_id, switch to Cassandra, or front it with a write-buffering queue"), then pick one and name what you're trading ("I'd shard by user_id because queries are user-scoped — the downside is cross-user analytics needs scatter-gather, which we'll push to a separate OLAP store").
Notice what's not in there: "we'll just use Kafka". Naming a technology is not an answer. Naming a technology and saying what it solves and what it costs is an answer.
Staff-level depth the interviewer is listening for. When the prompts above come up, the answers that signal staff+ all share one trait: they go one level deeper than the obvious fix and name a second-order problem.
| Scenario | Junior answer | Staff+ answer |
|---|---|---|
| Cache goes down | "We fall back to the DB" | "Yes, but if 10k concurrent threads all miss at once (thundering herd), the DB sees a request spike it wasn't provisioned for. I'd add request coalescing — only one request per key fetches from the DB; others wait on a promise — plus a circuit breaker so a partial cache outage doesn't cascade to full DB overload." |
| Shard by user_id | "Reduces write load per node" | "Works until a celebrity user drives 1000× the write rate of a typical user — that shard becomes hot. Mitigation: write-behind queue per user, or virtual sharding (hash user_id + bucket suffix) to spread a single user across N physical partitions." |
| Multi-region | "Async replication with ~50ms lag" | "Async is fine for reads, but any write that requires read-modify-write now has a race window across regions. For money-like invariants, you need either synchronous cross-region quorum (expensive, adds 50–200ms) or a single authoritative region per entity with local caching everywhere else." |
What a good deep dive looks like in practice
Here is the shape of a strong deep-dive exchange, so you can pattern-match when you're under the clock:
sequenceDiagram
participant I as Interviewer
participant C as Candidate
I->>C: "How does your database handle 100k writes/sec?"
C->>I: Acknowledge the constraint
Note over C: "A single primary won't keep up — let me walk through options."
C->>I: Name 2-3 alternatives with trade-offs
Note over C: "Shard by user_id, move to Cassandra, or add a write-buffering queue."
C->>I: Pick one and name the cost
Note over C: "I'd shard by user_id. The downside: cross-user queries need scatter-gather."
C->>I: Name the second-order problem
Note over C: "Hot shards are the real risk — celebrity users need virtual sub-sharding."
I->>C: Follow-up on second-order problem
C->>I: Go one level deeper
Work that loop — acknowledge, compare, commit, name the cost, surface the next problem — and the interviewer's pen will be moving.
Step 5: Bottlenecks and scaling
By minute 35 you should be looking at your own diagram with a skeptical eye, asking the questions an interviewer would ask:
Where is the single point of failure? The DB primary, usually. Plan for failover: synchronous standby for automatic promotion, read replicas to reduce load on primary, and a failover time you can actually state (standard RDS Multi-AZ PostgreSQL failover is typically 60–120 s per AWS documentation; Aurora PostgreSQL with a replica is faster, often under 30 s per AWS documentation; plan accordingly).
Where will the queue back up? What happens at 10× current load? A queue that backs up is a slow memory leak — name your consumer parallelism strategy and your DLQ (dead-letter queue) for poison messages.
What's the hottest table or cache key? Celebrity users, trending hashtags, and viral events create hot partitions. Know whether your sharding key distributes load or concentrates it.
What breaks first on a region failure? If your DB primary is in us-east-1 and us-west-2 goes down, that's a network partition — are your us-west-2 nodes serving stale reads or throwing errors? Your answer should reference CAP and your consistency choice.
Where's data on the slow path? What if Redis is empty (a cold cache)? A cache restart under high traffic is a thundering herd event. Strategies: key-level mutex/promise coalescing, probabilistic early expiration, or warm-up replay from the DB.
What monitoring would catch this in production? Staff candidates mention concrete metrics: p99 latency per endpoint, queue depth and consumer lag, cache hit rate, DB connection pool saturation, and replication lag. Mentioning these unprompted is a strong operational maturity signal.
If the interviewer hasn't already pushed on these, do it yourself. It shows you've built systems that failed in exactly these ways.
flowchart TD
OWN[Skeptic pass on your own design] --> SPF["Single point of failure?<br/>DB primary → plan failover time"]
OWN --> HOT["Hot partition?<br/>Celebrity user, trending key → virtual sharding"]
OWN --> QUEUE["Queue backlog?<br/>10× load → consumer parallelism + DLQ"]
OWN --> REGION["Region failure?<br/>CAP choice → stale reads or errors?"]
OWN --> COLD["Cold cache?<br/>Thundering herd → coalescing + warm-up"]
OWN --> MON["Monitoring?<br/>p99, queue lag, hit rate, replication lag"]
style OWN fill:#ff6b1a,color:#0a0a0f
style SPF fill:#0e7490,color:#fff
style HOT fill:#0e7490,color:#fff
style QUEUE fill:#0e7490,color:#fff
style REGION fill:#ff2e88,color:#fff
style COLD fill:#0e7490,color:#fff
style MON fill:#15803d,color:#fff
Q&A and trade-off articulation
In the last few minutes you'll get a couple of pointed questions. Don't ramble. Two-sentence answers.
If asked to compare two options (SQL vs NoSQL, sync vs async, push vs pull), structure as: the crisp difference, when you'd pick A, when you'd pick B, and your pick for this system with a one-sentence reason.
Signals interviewers actually look for
These are paraphrased from real interview rubrics at large tech companies:
| Signal | What it looks like |
|---|---|
| Requirements clarity | Asks before designing; documents what's in/out of scope |
| Quantitative reasoning | Estimates QPS, storage, bandwidth without prompting |
| Trade-off articulation | Doesn't just pick — says what they're trading and why |
| Failure thinking | Asks "what happens if X dies" before being asked |
| Operational maturity | Mentions monitoring, deploys, rollouts, capacity |
| Knows where the hard parts are | Goes deeper on the interesting parts, not the trivial ones |
What sinks a candidate: jumping to "we'll use Kafka and microservices" with no reason; hand-waving past consistency or failure modes; drawing 30 boxes and never going deep on any of them; giving memorized answers that don't fit the question; or disagreeing with the interviewer instead of incorporating their hint.
Common question categories
| Category | Examples | Typical deep dives |
|---|---|---|
| Feeds & timelines | Twitter, Instagram, News Feed | Fan-out vs pull, cache strategy, ranking |
| Messaging | WhatsApp, Slack, chat apps | Delivery semantics, presence, offline |
| Real-time | Uber, ride-share, live location | Geo-indexing, WebSockets, matching |
| Storage | Dropbox, Google Drive | Chunking, dedup, conflict resolution |
| Throttling | Rate limiter | Token bucket, distributed counters |
| Aggregation | Top-K, trending, leaderboards | Approximate algorithms, sliding windows |
| Search | Autocomplete, full-text search | Inverted index, ranking, trie |
| Notifications | Push, email, SMS | Fan-out, dedup, retries |
For each, know one or two canonical designs. The articles in this project cover the standard ones.
A reusable opener script
"Let me clarify a few things first.
[ask 4–6 requirements questions]
OK, so we're building [recap].
Quick estimation:
[QPS, storage, bandwidth]
I'll start with a high-level design and we can drill in.
[draw 4–8 boxes]
The trickiest pieces here are probably [X] and [Y].
Want me to start with [X]?"
That script gets you to minute 15 with structure. After that, follow the interviewer.
Practice routine
Doing 1–2 mock interviews out loud beats reading 20 articles silently. The reason: this interview tests communication under uncertainty, not knowledge. Find a peer, set a 45-minute timer, and design something. Then swap.
If you don't have a peer:
- Read a system on this site (e.g. design Twitter).
- Close it.
- Open a blank doc and design it from scratch in 45 minutes, talking aloud.
- Compare to the article. Note what you missed.
- Repeat with a different system tomorrow.
After 5–10 repetitions, the framework becomes automatic and you can spend brainpower on the actual design.
Things you should now be able to answer
- What are the first questions you ask in a system design interview, and why?
- Why does estimation matter even when you don't know exact numbers?
- What's a strong way to structure a trade-off comparison?
- What are the six bottleneck categories to check before the interview ends?
- What separates a senior signal from a junior one in this interview?
Further reading
- System Design Interview — An Insider's Guide (Alex Xu, Vol. 1 and 2)
- Hired Engineer's "System Design Mock Interviews" YouTube playlist
- ByteByteGo's weekly newsletter
- Each case study in this project's articles section
Frequently asked questions
▸What are the five phases of a system design interview and how long should each take?
The five phases are: requirements (0-5 min), estimation (5-10 min), high-level design (10-20 min), deep dives (20-35 min), and bottlenecks and scaling (35-42 min), with Q&A in the final three minutes. The phases are not strict — interviewers will pull you forward and backward — but if you have not said the word 'estimation' by minute 15, something has gone wrong.
▸Why is the requirements phase described as the most-skipped and most penalized step?
Jumping to the whiteboard before understanding what you are building is the single biggest failure mode in a system design interview. The requirements board — capturing functional scope, non-functional targets, and the out-of-scope pile — serves as the contract for the next 40 minutes, and every design decision should refer back to it.
▸What is a 'staff-level' deep-dive answer versus a junior answer when the cache goes down?
A junior answer says 'we fall back to the DB.' A staff-level answer identifies the thundering herd second-order problem: if 10k concurrent threads all miss at once, the DB sees a spike it was not provisioned for, so you add request coalescing — one request per key fetches from the DB while others wait on a promise — plus a circuit breaker to prevent a partial cache outage from cascading to full DB overload.
▸What six bottleneck categories should you check before the interview ends?
Single point of failure (DB primary, plan failover time — standard RDS Multi-AZ PostgreSQL failover is 60-120 s, Aurora with a replica is often under 30 s), hot partition (celebrity users concentrating writes on one shard), queue backlog at 10x load (consumer parallelism and dead-letter queues), region failure (CAP choice — stale reads or errors), cold cache thundering herd (key-level coalescing and warm-up replay), and monitoring signals (p99 latency, queue lag, cache hit rate, replication lag).
▸When does naming a technology count as a real answer in a deep dive?
Naming a technology alone is not an answer. An answer names the technology, states what problem it solves, and states what it costs — for example, sharding by user_id reduces write load per node but means cross-user analytics requires scatter-gather, which should be pushed to a separate OLAP store.
You may also like
Design an LLM Observability Platform
Build the distributed tracing backbone for non-deterministic, multi-step LLM applications — capturing every prompt, completion, token count, and dollar cost across chains, retrievals, and tool calls so you can debug a failed agent run and account for every cent.
Design an LLM Gateway (AI Gateway & Model Router)
A single proxy control plane in front of OpenAI, Anthropic, Google, and open models — routing ~65 trillion tokens a month with automatic failover, semantic caching, per-team budget enforcement, and streaming SSE passthrough, all under 50 ms of added latency.
Design an LLM Fine-Tuning Platform
Turn a base model and a dataset into a deployed fine-tuned adapter at scale — the end-to-end platform covering dataset ingestion, LoRA/QLoRA/DPO training, fault-tolerant distributed GPU scheduling, eval gating, and multi-LoRA serving for hundreds of concurrent fine-tunes.