MODULE 02 / 12crash course

~/roadmap/02-back-of-the-envelope

◆Beginner

Back-of-the-Envelope Estimation

Capacity math you can do in your head — QPS, storage, bandwidth, memory — the latency numbers every engineer should know, and worked examples for Twitter, URL shorteners, and a chat system.

14 min read2026-01-16Ironclad Academy

#fundamentals #capacity-planning #interviews

Capacity estimation is the single most under-practiced and over-tested skill in system design interviews. It's also the skill that, in real life, is the difference between an architecture that works and one that explodes the night before launch.

This module gives you (a) the numbers every engineer should have memorized, (b) a four-step capacity recipe, (c) three worked examples, and (d) the napkin you'll mentally pull out for the rest of your career.

Why estimate at all?

Because every architecture decision is a function of scale.

flowchart LR
    A[100 RPS] --> B[1 Postgres]
    B --> C[1k RPS] --> D[Postgres + Redis]
    D --> E[10k RPS] --> F[Replicas + LB]
    F --> G[100k RPS] --> H[Sharded + cached + queued]
    H --> I[1M RPS] --> J[Multi-region<br/>polyglot persistence]
    style B fill:#15803d,color:#fff
    style J fill:#ff2e88,color:#fff

If you don't know whether you're at 1k or 1M RPS, you don't know which row of this picture you're in — which means every other decision is guesswork. Estimation closes the guess.

Powers of 10 and the SI prefixes

Before anything else: get fluent with order-of-magnitude.

Notation	Value	Real-world example
`10^3` (Kilo, K)	1,000	A village
`10^6` (Mega, M)	1,000,000	A small city's population
`10^9` (Giga, G)	1,000,000,000	DAU of a top-10 app
`10^12` (Tera, T)	1,000,000,000,000	Tweets ever posted (~order)
`10^15` (Peta, P)	1,000,000,000,000,000	YouTube's video storage
`10^18` (Exa, E)	10^18	Total data on the public internet (rough)

Memorize this single fact: 86,400 seconds/day ≈ 100,000 ≈ 10^5. It lets you turn any "per day" into "per second" instantly.

So 1B requests/day ÷ 10^5 = 10^4 = 10,000 requests/sec. You just did capacity planning.

Useful seconds-time facts:

1 minute  = 60         ≈ 10^1.8
1 hour    = 3,600      ≈ 10^3.5
1 day     = 86,400     ≈ 10^5
1 month   = 2.6M       ≈ 10^6.4
1 year    = 31.5M      ≈ 10^7.5

Powers of 2 (for memory and storage)

Storage and memory are quoted in powers of 2:

Power	Bytes	Familiar form
2^10	1,024	1 KiB ≈ 1 KB
2^20	~10^6	1 MiB ≈ 1 MB
2^30	~10^9	1 GiB ≈ 1 GB
2^32	~4 × 10^9	Range of unsigned int
2^40	~10^12	1 TiB
2^50	~10^15	1 PiB
2^53	~9 × 10^15	Max safe JS integer
2^63	~9 × 10^18	Max signed long
2^64	~1.8 × 10^19	UUID space (64 bits half)

Why this matters: when you're sizing arrays, hashmaps, or partition counts, you'll often want powers of 2. When you hear "16-bit hash space" you should immediately think 65,536 buckets.

Latency numbers every programmer should know

Originally compiled by Jeff Dean at Google, periodically updated. These are the numbers you should think in:

L1 cache reference                            0.5 ns
Branch mispredict                             5   ns
L2 cache reference                            7   ns
Mutex lock/unlock                             25  ns
Main memory reference                         100 ns
Compress 1 KB with Zippy                      3   μs   = 3,000 ns
Send 1 KB over 1 Gbps network                 10  μs
Read 4 KB from SSD (NVMe)                     50  μs
Read 4 KB from SSD (SATA)                     150 μs
Read 1 MB sequentially from memory            250 μs
Round trip within same datacenter             500 μs   = 0.5 ms
Read 1 MB sequentially from SSD               1   ms
Disk seek (HDD)                               10  ms
Read 1 MB sequentially from HDD               20  ms
Send packet CA → Netherlands → CA             150 ms
Cold fetch from object storage (S3)           ~100 ms

flowchart LR
    L1[L1 cache: 0.5ns] --> RAM[RAM: 100ns]
    RAM --> SSD[SSD: 50μs]
    SSD --> NET[Same-DC RTT: 500μs]
    NET --> HDD[HDD seek: 10ms]
    HDD --> WAN[Cross-continent: 150ms]
    style L1 fill:#15803d,color:#fff
    style RAM fill:#ffaa00,color:#0a0a0f
    style SSD fill:#ff6b1a,color:#0a0a0f
    style NET fill:#ff6b1a,color:#0a0a0f
    style HDD fill:#ff2e88,color:#fff
    style WAN fill:#a855f7,color:#fff

Read these as ratios, not absolutes:

RAM is ~500× faster than SSD (NVMe: 100 ns vs 50 μs).
SSD is ~200× faster than HDD seek (50 μs vs 10 ms).
A same-datacenter round-trip is ~5,000× slower than a memory access.
A cross-continent round-trip is ~300× slower than a same-DC round-trip.

This is why every architecture diagram has a cache in front of the database. This is why we colocate services in the same region. This is why CDNs exist.

Latency in human terms

Scaling everything up by 10^9 (turning nanoseconds into seconds) gives you intuition that sticks:

Operation	Real	Scaled (×10^9)
L1 cache	0.5 ns	0.5 sec
Main memory	100 ns	100 sec (~2 min)
SSD read	50 μs	~14 hours
Cross-DC RTT	500 μs	~6 days
HDD seek	10 ms	~4 months
Cross-continent	150 ms	~5 years

When your code does an HDD seek instead of staying in cache, the human-time equivalent is "instead of finishing this sentence, take a 4-month nap." That's the cost of a cache miss.

Hardware specs to keep in mind

Modern commodity server (think r6i.4xlarge or similar memory-optimized instance):

CPU:     ~16 vCPUs @ 3.5 GHz
RAM:     ~128 GB
Network: up to 12.5 Gbps (r6i.4xlarge; larger instances reach 25–50 Gbps)
NVMe:    available on local-storage variants (r6id); EBS: ~10 Gbps, ~40k IOPS
HDD:     ~150 MB/s sequential, ~150 random IOPS

A few useful "what fits on one box" rules of thumb: a single Linux box handles roughly 50k–100k QPS of simple HTTP; a well-tuned Postgres sustains around 50k writes/sec for simple inserts on fast hardware (complex transactional workloads are more like 5–20k TPS); a Redis instance can do ~100k–1M ops/sec without pipelining, and well into the millions with pipelining — plan on ~100k–200k for realistic mixed workloads; practical RAM ceiling is ~1 TB hot working set; and network tops out around ~10 GB/s on a high-tier instance. When you exceed these, you start sharding, replicating, and caching — all the topics in the rest of this course.

The four-step recipe

Given any system design problem, do these four estimates in this order: traffic → storage → bandwidth → memory. Each answer shapes the next one, so the order matters.

flowchart TD
    A[1. Traffic Estimate<br/>QPS read and write] --> B[2. Storage Estimate<br/>bytes/event x events/year]
    B --> C[3. Bandwidth Estimate<br/>bytes x QPS]
    C --> D[4. Memory Estimate<br/>cache size for 80/20 rule]
    style A fill:#ff6b1a,color:#0a0a0f
    style B fill:#ffaa00,color:#0a0a0f
    style C fill:#15803d,color:#fff
    style D fill:#0e7490,color:#fff

Traffic. Start with DAU, multiply by actions per user per day, divide by 86,400 (or 10^5 for napkin math) to get average QPS, then multiply by 2–3× for peak. Always compute the read:write ratio separately — most systems are heavily skewed toward reads, and the write rate is what drives your database choices.

Storage. Multiply bytes-per-event by events-per-day by your retention period. Then add ~30% overhead for indexes, multiply by ~3× for replication (replication factor 3 is the norm), and tack on 20–50% for backups and historical. The replication multiplier alone often triples your naive estimate — don't skip it.

Bandwidth. Ingress is avg_event_size × write_QPS; egress is avg_response_size × read_QPS. Don't forget heavier traffic like thumbnails or video re-encoding. Bandwidth is what tells you whether you need a CDN — once egress climbs past ~1 GB/s from origin, the bill alone justifies edge caching.

Memory. Apply the 80/20 rule: 20% of your data serves 80% of requests. Daily cache size ≈ 20% × daily read bytes. Multiply by your replication factor to get total RAM needed. If this number is small (sub-GB), you can fit the hot working set on a single Redis node. If it's in the tens of terabytes, you're looking at a distributed cache cluster.

A quick math interlude

You will be doing a lot of multiplication. Two tricks to do it fast in your head:

Trick 1: align the powers of 10. When you see 2.5B × 400 bytes, write it as 2.5 × 10^9 × 4 × 10^2 = 10 × 10^11 = 10^12 = 1 TB. You don't need to multiply 2.5 × 400.

Trick 2: the "round, then correct" pattern. Round messy numbers to the nearest power of 10 first; reason about the order of magnitude; tweak at the end if needed.

500M × 5 = 2.5B   (could just say "few billion")
75k × 1KB         ≈ 100 MB/s
75k × 200KB       ≈ 15 GB/s

If you're off by 2× and the number is 100 GB/s, the architecture is the same. If you're off by 2× and the number is 100 KB/s vs 200 KB/s, who cares. Order of magnitude is what matters.

Worked example 1: Twitter

Design Twitter for 500M DAU. Each user posts ~5 tweets/day, reads ~50 tweets/day. Tweet is ~280 chars + metadata.

Traffic

Writes (tweets):
  500M × 5 = 2.5B/day
  2.5 × 10^9 / 10^5 = 25,000/sec average
  Peak    ≈ 75,000/sec (× 3)

Reads (timeline loads):
  500M × 50 = 25B/day
  25 × 10^9 / 10^5 = 250,000/sec average
  Peak    ≈ 750,000/sec

Read:write ≈ 10:1

What it tells you: 75k tweets/sec peak. One Postgres tops out around ~50k simple writes/sec on well-tuned hardware, so even before you count indexes and replication, this needs sharding immediately. 750k reads/sec demands caching.

Storage

Tweet payload:        280 bytes text + ~120 bytes metadata ≈ 400 bytes
Plus media (avg):     ~200 KB on 20% of tweets

Per day:
  Text:     2.5B × 400B  = 1 TB/day
  Media:    0.5B × 200KB = 100 TB/day

Per year:
  Text:     ~365 TB
  Media:    ~36 PB

Replication × 3:
  Text:     ~1 PB/year
  Media:    ~110 PB/year

Media dwarfs text by 100×. The architecture is bottlenecked by media, which means S3-class object storage and an aggressive CDN.

Bandwidth

Write ingress (text only):  25k × 400B   = 10 MB/sec
Read egress  (text only):  250k × 400B   = 100 MB/sec
Media writes:                5k × 200KB  = 1 GB/sec
Media reads:                50k × 200KB  = 10 GB/sec

That media read bandwidth is why Twitter has a CDN. 10 GB/s served from origin is a $millions/month bill; from CDN edges it's a fraction.

Cache

Daily reads = 25B × 400B = 10 TB/day text
20% hot     = 2 TB hot timeline data
× replication factor 3 = 6 TB total cache RAM needed
Distribute over Redis cluster: 100 nodes × 60 GB each.

That's the napkin. It tells you: ~75k tweets/sec peak means a single Postgres won't cut it — you need sharding. 100 GB/year of metadata fits comfortably in one big Postgres per shard. Media is the real cost — solve with object storage (S3) + CDN (CloudFront). 6 TB hot cache fits in a Redis cluster of 100 nodes at ~60 GB each.

Worked example 2: A URL shortener

Design TinyURL for 100M new URLs per month, 10:1 read:write ratio.

Traffic

Writes (creates):
  100M / 30 / 86400 ≈ 40 writes/sec average
  Peak                ≈ 120 writes/sec

Reads (redirects):
  10× writes          ≈ 400 reads/sec average
  Peak                ≈ 1,200 reads/sec

Storage

5 years:
  100M × 60 months = 6B URLs
  Each row ~500B   = 3 TB total

  Easy on a single replicated Postgres.

Cache

80/20 rule: 20% of short URLs serve ~80% of redirects
Daily-active URLs ≈ 40 writes/sec × 86,400 ≈ 3.5M new/day
Hot working set  ~ 20% of daily-active ≈ 700k codes ≈ 1M (rounded)
                   1M hot codes × 500 B  = 500 MB

Fits in a single Redis instance.

The shape of this system is completely different from Twitter — and the napkin tells you why before you draw a single box. This is "1 Postgres + 1 Redis," not "100 shards and Cassandra."

Worked example 3: A chat / messaging system

Design WhatsApp-style messaging for 1B DAU, average user sends 50 messages/day.

Traffic

Messages sent: 1B × 50 = 50B/day
Per second:    50B / 10^5 = 500k/sec average
Peak:          ~1.5M/sec

But every message is also delivered to (typically) 1 recipient,
so write fanout = 1× → ~1.5M delivery writes/sec at peak

(For group chats with N members, multiply by N.)

Storage

Per message: 100 bytes (text) + 50 bytes metadata = 150 bytes
Per day:     50B × 150B = 7.5 TB/day text
Per year:    ~2.7 PB

Plus media (let's say 10% of messages, avg 1 MB):
  5B × 1 MB = 5 PB/day media

Replication × 3:
  Text: ~9 PB / year
  Media: similar magnitude per year, but archived aggressively

Bandwidth

Write ingress (text): 1.5M × 150B = 225 MB/sec
Push to recipients:   1.5M × 150B = 225 MB/sec (1:1 chat)
Media: dominated by uploads/downloads — at 10% rate, ~150 GB/sec peak

Architecture implications

The numbers drive the choices directly. 1.5M messages/sec writes means you must shard the message store — Cassandra and similar wide-column stores are natural fits here. You need a presence service to distinguish online from offline recipients, so you can queue messages for users who aren't connected. Media gets pushed to S3 + CDN just like Twitter. And at this connection count you need long-lived WebSockets, which means a dedicated connection routing layer. That's the foundation; the chat system article goes deeper.

flowchart LR
    TRAFFIC["1. Traffic: 1.5M msg/sec peak"] --> SHARD["Shard the message store<br/>Cassandra / wide-column"]
    TRAFFIC --> PRESENCE["Presence service<br/>online vs offline"]
    TRAFFIC --> WS["WebSocket layer<br/>long-lived connections"]
    SHARD --> MEDIA["10% media x 1MB = 5 PB/day<br/>S3 + CDN"]
    style TRAFFIC fill:#ff6b1a,color:#0a0a0f
    style SHARD fill:#15803d,color:#fff
    style PRESENCE fill:#0e7490,color:#fff
    style WS fill:#a855f7,color:#fff
    style MEDIA fill:#ffaa00,color:#0a0a0f

Sanity-checking your numbers

Once you finish a calculation, spend ten seconds checking the result makes physical sense. If your estimate says a single box needs to serve 500k writes/sec, that's a red flag — one Postgres tops out around 50k. If bandwidth comes out above 1 GB/s from origin, a CDN stops being optional. If your hot working set exceeds 1 TB, you'll need to partition the cache across multiple Redis nodes.

A rough monthly cost check: multiply egress GB/s by your effective $/GB rate — roughly $0.09/GB for S3 direct, or $0.01–0.08/GB for a metered CDN — to get $/s, then × 86,400 × 30 for monthly. When that number crosses seven figures, you're no longer making an engineering decision — you're making a business one, and routing through a CDN can cut the bill by 5–10×.

flowchart TD
    RESULT[Raw calculation result] --> W{"Writes/sec > 50k?"}
    W -->|yes| SHARD2[Plan to shard]
    W -->|no| BW{"Bandwidth > 1 GB/s?"}
    BW -->|yes| CDN2[Add CDN]
    BW -->|no| CACHE{"Hot set > 1 TB?"}
    CACHE -->|yes| PART[Partition cache]
    CACHE -->|no| OK[Single-node design is fine]
    style SHARD2 fill:#ff2e88,color:#fff
    style CDN2 fill:#ff6b1a,color:#0a0a0f
    style PART fill:#ffaa00,color:#0a0a0f
    style OK fill:#15803d,color:#fff

Cheat sheet

1 byte  ≈ 1 char
1 KB    ≈ 1 paragraph of text or a tiny JSON
1 MB    ≈ a small image or 1 minute of MP3
1 GB    ≈ a 2-hour SD video or ~1 hour of HD
1 TB    ≈ 1000 hours of HD video
1 PB    ≈ 250 million photos

1 day  ≈ 10^5 seconds
1 year ≈ 3 × 10^7 seconds
1 minute = 60 seconds (≈ 10^1.8)

1 Mbps  ≈ 125 KB/sec
1 Gbps  ≈ 125 MB/sec
10 Gbps ≈ 1.25 GB/sec
40 Gbps ≈ 5 GB/sec
100 Gbps ≈ 12.5 GB/sec

1 TB SSD ≈ $50    1 TB HDD ≈ $20    1 TB S3 ≈ $23/month
1 GB RAM ≈ $5
1 hour AWS cores  ≈ $0.05–0.20/core depending on type
S3 egress         ≈ $0.05–0.09/GB (tiered; $0.09 first 10 TB, $0.05 above 150 TB)
CDN egress        ≈ $0.01–0.085/GB for metered CDNs (CloudFront etc.); Cloudflare CDN charges $0 on self-serve plans

1 Postgres ≈ ~50k simple inserts/sec; ~5–20k for complex txns; ~1 TB hot
1 Redis    ≈ ~100k–1M ops/sec (no pipeline); much higher with pipelining; RAM-bound
1 Kafka broker ≈ ~100MB/sec ingest

Things you should now be able to answer

A service has 100M DAU, each user makes 20 requests/day. What's the peak QPS?
You're caching 20% hot data out of 5 TB daily reads. How big is your cache?
Why is reading a 1 MB row across a transatlantic link ~150ms minimum?
A "Like" event is 80 bytes; the system gets 200k likes/sec; storage cost over a year (with 3× replication, 30% overhead)?
A single-box Postgres handles ~50k writes/sec. At what DAU × actions/user does sharding become unavoidable?

→ Next: Networking & HTTP

// KEY TAKEAWAYS

▸Approximating 86,400 seconds/day as 10^5 is the single fastest mental shortcut for converting any per-day metric to per-second.
▸Storage estimates routinely triple once you apply a replication factor of 3 — skipping that multiplier is the most common napkin-math mistake.
▸RAM is 500x faster than NVMe SSD and 100,000x faster than an HDD seek; these ratios, not absolutes, explain every caching and colocation decision in distributed systems.
▸The 80/20 rule sets your cache size: 20% of data serves 80% of reads, so daily cache RAM needed equals 20% of daily read bytes times your replication factor.
▸Scale determines architecture — a URL shortener at 40 writes/sec needs one Postgres, while Twitter at 75,000 peak writes/sec requires sharding before you draw a single architecture box.

// FAQ

Frequently asked questions

▸What is the single most useful number to memorize for back-of-the-envelope estimation?

86,400 seconds per day, approximated as 10^5. It lets you convert any per-day figure to per-second instantly: 1B requests/day divided by 10^5 equals 10,000 requests/sec.

▸How much faster is RAM than NVMe SSD, and why does that ratio matter for architecture?

RAM is roughly 500x faster than NVMe SSD — 100 ns versus 50 microseconds. That gap is why every serious architecture diagram places a cache in front of the database.

▸What are the four steps of the capacity estimation recipe, and why does order matter?

The four steps are traffic, storage, bandwidth, and memory, in that order. Each answer feeds into the next: traffic drives storage size, storage and traffic together determine bandwidth, and bandwidth determines how large your cache working set needs to be.

▸At what write rate does a single Postgres instance become a bottleneck, and what does that mean for Twitter-scale systems?

A well-tuned Postgres handles roughly 50,000 simple inserts per second on fast hardware; complex transactional workloads are closer to 5,000-20,000 TPS. At Twitter scale, peak tweet writes hit 75,000 per second, so sharding is unavoidable before you even account for indexes and replication.

▸How do the capacity numbers for a URL shortener compare to those for Twitter, and what does that imply architecturally?

TinyURL at 100M URLs per month generates about 40 writes/sec and 400 reads/sec, with a total 5-year dataset of 3 TB and a hot working set of just 500 MB. That fits on a single replicated Postgres and a single Redis node — a completely different architecture from Twitter's 100 shards and distributed cache cluster.

← previous module

What Is System Design?

next module →

Networking and HTTP