~/articles/how-to-approach-system-design-interview

◆Beginnerasked at Metaasked at Googleasked at Amazonasked at Netflix

How to Approach a System Design Interview

A repeatable 45-minute framework — requirements, estimation, high-level design, deep dives, bottlenecks — and the moves that separate a hire from a no-hire.

12 min read2026-02-23Ironclad Academy

#interview #process #fundamentals

// DEPTH

the full breakdown — requirements, capacity, evolution, trade-offs

A system design interview is not a test of how many buzzwords you can fit in 45 minutes. It's a test of whether you can take a fuzzy real-world problem, pin down what actually matters, and walk through a credible design while a stranger interrupts you with questions. Senior interviewers can tell the difference between "I read a blog yesterday" and "I have built this" inside the first ten minutes.

This article is the framework I use to keep the conversation on rails. It works across every flavor of question — design Twitter, design a rate limiter, design a payment system — because the process is the same. What changes is which deep-dive you reach for at minute 25.

The 45-minute clock

These are not strict — interviewers will pull you forward and backward — but if you've never said the word "estimation" by minute 15, something has gone wrong.

Step 1: Requirements (the most-skipped step)

The single biggest failure mode in this interview is jumping to a whiteboard before you understand what you're building. Slow down. Force the scope conversation.

Ask two kinds of questions:

Functional: what should the system do?

"Should users be able to upload, or only view?"
"Do we need search? Filtering? Recommendations?"
"Is this read-heavy or write-heavy?"
"Do messages need to be ordered? Globally or per-conversation?"

Non-functional: how well should it do it?

"What scale are we designing for? 1k users? 1M? 1B?"
"What's the latency target? p99 under 200ms?"
"How available? Three nines? Five?"
"How consistent? Eventual is OK for a feed, not OK for a wallet."

Write the answers down on the board in a corner. They are your contract for the next 40 minutes. Every design decision will refer back to them.

flowchart TD
    R[Requirements board] --> F[Functional<br/>upload, view, comment, search]
    R --> NF[Non-functional<br/>100M DAU, 100ms p99,<br/>99.99% available, eventual OK]
    R --> OS[Out of scope<br/>analytics, admin panel,<br/>billing]
    style R fill:#ff6b1a,color:#0a0a0f

The "out of scope" pile is just as important. Interviewers respect a candidate who says "we're not going to design moderation today, but flag it for later" — it shows judgment.

Step 2: Estimation (back-of-the-envelope)

Estimation is where you justify every later decision. You don't need to be exact. You need to be within an order of magnitude and able to defend the numbers.

Cover the four big quantities:

Quantity	Why it matters
QPS (reads & writes)	Sizes your servers and load balancers
Storage	Picks your database; bytes/year cost
Bandwidth	Egress costs; CDN strategy
Memory	Cache sizing

Worked example — designing Twitter (hypothetical teaching scale; actual reported peak was ~400M MAU):

500M MAU × 30% DAU = 150M DAU
150M × 1 tweet/user/day = 150M tweets/day ≈ 1.7k tweets/sec write
Reads ~100× writes → 170k reads/sec
Each tweet ~300 bytes → 150M × 300B ≈ 45 GB/day, ~16 TB/year of just tweet text
Add media at 10× → 160 TB/year

Now every later decision points back to these. "We need a CDN because we're moving 160 TB of content." "Cache the timeline because reads are 100× writes."

There's a whole crash-course module on this. Memorize the standard anchors (1 KB typical row, ~500 B typical HTTP request, single Postgres primary comfortable up to ~5–10k writes/sec, Redis at ~100k+ ops/sec, commodity HTTP tier at ~20–50k req/sec for lightweight handlers) so you don't burn 5 minutes on arithmetic. The right number depends on request complexity — state a workload assumption when you cite any box-level throughput figure.

Step 3: High-level design

Now — and only now — go to the whiteboard. Draw the boxes and arrows version of the system.

A good high-level design has 4–8 boxes. More than that and you're already in the weeds. Less than that and you haven't said anything.

flowchart LR
    U[Clients] --> CDN
    CDN --> LB[Load Balancer]
    LB --> APP[App Servers]
    APP --> CACHE[(Redis Cache)]
    APP --> DB[(Primary DB)]
    APP --> Q[Queue]
    Q --> WK[Workers]
    WK --> DB
    style APP fill:#ff6b1a,color:#0a0a0f
    style CACHE fill:#15803d,color:#fff

Talk through it: "Clients hit the CDN for static assets. Dynamic requests go through a load balancer to a fleet of stateless app servers. Reads check Redis first, then the DB. Writes go to the DB and emit events to a queue for async work."

That's the shape. The interviewer will pick the box that interests them and ask you to expand it.

Step 4: Deep dives — where the interview is actually won

This is where most of your time goes, and where seniority is judged. The interviewer will push you into one or two specific areas. Common deep-dive prompts:

Prompt	What they're testing
"How does your database handle this many writes?"	Sharding, replication, choice of DB
"What if the cache goes down?"	Failure modes, thundering herd, fallbacks
"How would you scale to 10× the load?"	Bottleneck analysis, horizontal scaling
"What if two users edit at once?"	Concurrency, locking, conflict resolution
"How do you ensure messages aren't lost?"	Delivery semantics, idempotency, dead-letter queues
"How would you make this multi-region?"	Replication lag, CAP, consistency models

Strong answers have a before/after structure: acknowledge the issue ("right, with 100k writes/sec, a single Postgres will not keep up"), sketch two or three alternatives ("we could shard by user_id, switch to Cassandra, or front it with a write-buffering queue"), then pick one and name what you're trading ("I'd shard by user_id because queries are user-scoped — the downside is cross-user analytics needs scatter-gather, which we'll push to a separate OLAP store").

Notice what's not in there: "we'll just use Kafka". Naming a technology is not an answer. Naming a technology and saying what it solves and what it costs is an answer.

Staff-level depth the interviewer is listening for. When the prompts above come up, the answers that signal staff+ all share one trait: they go one level deeper than the obvious fix and name a second-order problem.

Scenario	Junior answer	Staff+ answer
Cache goes down	"We fall back to the DB"	"Yes, but if 10k concurrent threads all miss at once (thundering herd), the DB sees a request spike it wasn't provisioned for. I'd add request coalescing — only one request per key fetches from the DB; others wait on a promise — plus a circuit breaker so a partial cache outage doesn't cascade to full DB overload."
Shard by user_id	"Reduces write load per node"	"Works until a celebrity user drives 1000× the write rate of a typical user — that shard becomes hot. Mitigation: write-behind queue per user, or virtual sharding (hash user_id + bucket suffix) to spread a single user across N physical partitions."
Multi-region	"Async replication with ~50ms lag"	"Async is fine for reads, but any write that requires read-modify-write now has a race window across regions. For money-like invariants, you need either synchronous cross-region quorum (expensive, adds 50–200ms) or a single authoritative region per entity with local caching everywhere else."

What a good deep dive looks like in practice

Here is the shape of a strong deep-dive exchange, so you can pattern-match when you're under the clock:

sequenceDiagram
    participant I as Interviewer
    participant C as Candidate
    I->>C: "How does your database handle 100k writes/sec?"
    C->>I: Acknowledge the constraint
    Note over C: "A single primary won't keep up — let me walk through options."
    C->>I: Name 2-3 alternatives with trade-offs
    Note over C: "Shard by user_id, move to Cassandra, or add a write-buffering queue."
    C->>I: Pick one and name the cost
    Note over C: "I'd shard by user_id. The downside: cross-user queries need scatter-gather."
    C->>I: Name the second-order problem
    Note over C: "Hot shards are the real risk — celebrity users need virtual sub-sharding."
    I->>C: Follow-up on second-order problem
    C->>I: Go one level deeper

Work that loop — acknowledge, compare, commit, name the cost, surface the next problem — and the interviewer's pen will be moving.

Step 5: Bottlenecks and scaling

By minute 35 you should be looking at your own diagram with a skeptical eye, asking the questions an interviewer would ask:

Where is the single point of failure? The DB primary, usually. Plan for failover: synchronous standby for automatic promotion, read replicas to reduce load on primary, and a failover time you can actually state (standard RDS Multi-AZ PostgreSQL failover is typically 60–120 s per AWS documentation; Aurora PostgreSQL with a replica is faster, often under 30 s per AWS documentation; plan accordingly).

Where will the queue back up? What happens at 10× current load? A queue that backs up is a slow memory leak — name your consumer parallelism strategy and your DLQ (dead-letter queue) for poison messages.

What's the hottest table or cache key? Celebrity users, trending hashtags, and viral events create hot partitions. Know whether your sharding key distributes load or concentrates it.

What breaks first on a region failure? If your DB primary is in us-east-1 and us-west-2 goes down, that's a network partition — are your us-west-2 nodes serving stale reads or throwing errors? Your answer should reference CAP and your consistency choice.

Where's data on the slow path? What if Redis is empty (a cold cache)? A cache restart under high traffic is a thundering herd event. Strategies: key-level mutex/promise coalescing, probabilistic early expiration, or warm-up replay from the DB.

What monitoring would catch this in production? Staff candidates mention concrete metrics: p99 latency per endpoint, queue depth and consumer lag, cache hit rate, DB connection pool saturation, and replication lag. Mentioning these unprompted is a strong operational maturity signal.

If the interviewer hasn't already pushed on these, do it yourself. It shows you've built systems that failed in exactly these ways.

flowchart TD
    OWN[Skeptic pass on your own design] --> SPF["Single point of failure?<br/>DB primary → plan failover time"]
    OWN --> HOT["Hot partition?<br/>Celebrity user, trending key → virtual sharding"]
    OWN --> QUEUE["Queue backlog?<br/>10× load → consumer parallelism + DLQ"]
    OWN --> REGION["Region failure?<br/>CAP choice → stale reads or errors?"]
    OWN --> COLD["Cold cache?<br/>Thundering herd → coalescing + warm-up"]
    OWN --> MON["Monitoring?<br/>p99, queue lag, hit rate, replication lag"]
    style OWN fill:#ff6b1a,color:#0a0a0f
    style SPF fill:#0e7490,color:#fff
    style HOT fill:#0e7490,color:#fff
    style QUEUE fill:#0e7490,color:#fff
    style REGION fill:#ff2e88,color:#fff
    style COLD fill:#0e7490,color:#fff
    style MON fill:#15803d,color:#fff

Q&A and trade-off articulation

In the last few minutes you'll get a couple of pointed questions. Don't ramble. Two-sentence answers.

If asked to compare two options (SQL vs NoSQL, sync vs async, push vs pull), structure as: the crisp difference, when you'd pick A, when you'd pick B, and your pick for this system with a one-sentence reason.

Signals interviewers actually look for

These are paraphrased from real interview rubrics at large tech companies:

Signal	What it looks like
Requirements clarity	Asks before designing; documents what's in/out of scope
Quantitative reasoning	Estimates QPS, storage, bandwidth without prompting
Trade-off articulation	Doesn't just pick — says what they're trading and why
Failure thinking	Asks "what happens if X dies" before being asked
Operational maturity	Mentions monitoring, deploys, rollouts, capacity
Knows where the hard parts are	Goes deeper on the interesting parts, not the trivial ones

What sinks a candidate: jumping to "we'll use Kafka and microservices" with no reason; hand-waving past consistency or failure modes; drawing 30 boxes and never going deep on any of them; giving memorized answers that don't fit the question; or disagreeing with the interviewer instead of incorporating their hint.

Common question categories

Category	Examples	Typical deep dives
Feeds & timelines	Twitter, Instagram, News Feed	Fan-out vs pull, cache strategy, ranking
Messaging	WhatsApp, Slack, chat apps	Delivery semantics, presence, offline
Real-time	Uber, ride-share, live location	Geo-indexing, WebSockets, matching
Storage	Dropbox, Google Drive	Chunking, dedup, conflict resolution
Throttling	Rate limiter	Token bucket, distributed counters
Aggregation	Top-K, trending, leaderboards	Approximate algorithms, sliding windows
Search	Autocomplete, full-text search	Inverted index, ranking, trie
Notifications	Push, email, SMS	Fan-out, dedup, retries

For each, know one or two canonical designs. The articles in this project cover the standard ones.

A reusable opener script

"Let me clarify a few things first.
[ask 4–6 requirements questions]

OK, so we're building [recap].
Quick estimation:
[QPS, storage, bandwidth]

I'll start with a high-level design and we can drill in.
[draw 4–8 boxes]

The trickiest pieces here are probably [X] and [Y].
Want me to start with [X]?"

That script gets you to minute 15 with structure. After that, follow the interviewer.

Practice routine

Doing 1–2 mock interviews out loud beats reading 20 articles silently. The reason: this interview tests communication under uncertainty, not knowledge. Find a peer, set a 45-minute timer, and design something. Then swap.

If you don't have a peer:

Read a system on this site (e.g. design Twitter).
Close it.
Open a blank doc and design it from scratch in 45 minutes, talking aloud.
Compare to the article. Note what you missed.
Repeat with a different system tomorrow.

After 5–10 repetitions, the framework becomes automatic and you can spend brainpower on the actual design.

Things you should now be able to answer

What are the first questions you ask in a system design interview, and why?
Why does estimation matter even when you don't know exact numbers?
What's a strong way to structure a trade-off comparison?
What are the six bottleneck categories to check before the interview ends?
What separates a senior signal from a junior one in this interview?

Frequently asked questions

▸What are the five phases of a system design interview and how long should each take?

The five phases are: requirements (0-5 min), estimation (5-10 min), high-level design (10-20 min), deep dives (20-35 min), and bottlenecks and scaling (35-42 min), with Q&A in the final three minutes. The phases are not strict — interviewers will pull you forward and backward — but if you have not said the word 'estimation' by minute 15, something has gone wrong.

▸Why is the requirements phase described as the most-skipped and most penalized step?

Jumping to the whiteboard before understanding what you are building is the single biggest failure mode in a system design interview. The requirements board — capturing functional scope, non-functional targets, and the out-of-scope pile — serves as the contract for the next 40 minutes, and every design decision should refer back to it.

▸What is a 'staff-level' deep-dive answer versus a junior answer when the cache goes down?

A junior answer says 'we fall back to the DB.' A staff-level answer identifies the thundering herd second-order problem: if 10k concurrent threads all miss at once, the DB sees a spike it was not provisioned for, so you add request coalescing — one request per key fetches from the DB while others wait on a promise — plus a circuit breaker to prevent a partial cache outage from cascading to full DB overload.

▸What six bottleneck categories should you check before the interview ends?

Single point of failure (DB primary, plan failover time — standard RDS Multi-AZ PostgreSQL failover is 60-120 s, Aurora with a replica is often under 30 s), hot partition (celebrity users concentrating writes on one shard), queue backlog at 10x load (consumer parallelism and dead-letter queues), region failure (CAP choice — stale reads or errors), cold cache thundering herd (key-level coalescing and warm-up replay), and monitoring signals (p99 latency, queue lag, cache hit rate, replication lag).

▸When does naming a technology count as a real answer in a deep dive?

Naming a technology alone is not an answer. An answer names the technology, states what problem it solves, and states what it costs — for example, sharding by user_id reduces write load per node but means cross-user analytics requires scatter-gather, which should be pushed to a separate OLAP store.

← previous

How to Choose a Database

SQL vs NoSQL — How to Actually Choose

// RELATED