Back-of-the-Envelope Estimation
Capacity math you can do in your head — QPS, storage, bandwidth, memory — the latency numbers every engineer should know, and worked examples for Twitter, URL shorteners, and a chat system.
Capacity estimation is the single most under-practiced and over-tested skill in system design interviews. It's also the skill that, in real life, is the difference between an architecture that works and one that explodes the night before launch.
This module gives you (a) the numbers every engineer should have memorized, (b) a four-step capacity recipe, (c) three worked examples, and (d) the napkin you'll mentally pull out for the rest of your career.
Why estimate at all?
Because every architecture decision is a function of scale.
flowchart LR
A[100 RPS] --> B[1 Postgres]
B --> C[1k RPS] --> D[Postgres + Redis]
D --> E[10k RPS] --> F[Replicas + LB]
F --> G[100k RPS] --> H[Sharded + cached + queued]
H --> I[1M RPS] --> J[Multi-region<br/>polyglot persistence]
style B fill:#15803d,color:#fff
style J fill:#ff2e88,color:#fff
If you don't know whether you're at 1k or 1M RPS, you don't know which row of this picture you're in — which means every other decision is guesswork. Estimation closes the guess.
Powers of 10 and the SI prefixes
Before anything else: get fluent with order-of-magnitude.
| Notation | Value | Real-world example |
|---|---|---|
10^3 (Kilo, K) | 1,000 | A village |
10^6 (Mega, M) | 1,000,000 | A small city's population |
10^9 (Giga, G) | 1,000,000,000 | DAU of a top-10 app |
10^12 (Tera, T) | 1,000,000,000,000 | Tweets ever posted (~order) |
10^15 (Peta, P) | 1,000,000,000,000,000 | YouTube's video storage |
10^18 (Exa, E) | 10^18 | Total data on the public internet (rough) |
Memorize this single fact: 86,400 seconds/day ≈ 100,000 ≈ 10^5. It lets you turn any "per day" into "per second" instantly.
So 1B requests/day ÷ 10^5 = 10^4 = 10,000 requests/sec. You just did capacity planning.
Useful seconds-time facts:
1 minute = 60 ≈ 10^1.8
1 hour = 3,600 ≈ 10^3.5
1 day = 86,400 ≈ 10^5
1 month = 2.6M ≈ 10^6.4
1 year = 31.5M ≈ 10^7.5
Powers of 2 (for memory and storage)
Storage and memory are quoted in powers of 2:
| Power | Bytes | Familiar form |
|---|---|---|
| 2^10 | 1,024 | 1 KiB ≈ 1 KB |
| 2^20 | ~10^6 | 1 MiB ≈ 1 MB |
| 2^30 | ~10^9 | 1 GiB ≈ 1 GB |
| 2^32 | ~4 × 10^9 | Range of unsigned int |
| 2^40 | ~10^12 | 1 TiB |
| 2^50 | ~10^15 | 1 PiB |
| 2^53 | ~9 × 10^15 | Max safe JS integer |
| 2^63 | ~9 × 10^18 | Max signed long |
| 2^64 | ~1.8 × 10^19 | UUID space (64 bits half) |
Why this matters: when you're sizing arrays, hashmaps, or partition counts, you'll often want powers of 2. When you hear "16-bit hash space" you should immediately think 65,536 buckets.
Latency numbers every programmer should know
Originally compiled by Jeff Dean at Google, periodically updated. These are the numbers you should think in:
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns
Mutex lock/unlock 25 ns
Main memory reference 100 ns
Compress 1 KB with Zippy 3 μs = 3,000 ns
Send 1 KB over 1 Gbps network 10 μs
Read 4 KB from SSD (NVMe) 50 μs
Read 4 KB from SSD (SATA) 150 μs
Read 1 MB sequentially from memory 250 μs
Round trip within same datacenter 500 μs = 0.5 ms
Read 1 MB sequentially from SSD 1 ms
Disk seek (HDD) 10 ms
Read 1 MB sequentially from HDD 20 ms
Send packet CA → Netherlands → CA 150 ms
Cold fetch from object storage (S3) ~100 ms
flowchart LR
L1[L1 cache: 0.5ns] --> RAM[RAM: 100ns]
RAM --> SSD[SSD: 50μs]
SSD --> NET[Same-DC RTT: 500μs]
NET --> HDD[HDD seek: 10ms]
HDD --> WAN[Cross-continent: 150ms]
style L1 fill:#15803d,color:#fff
style RAM fill:#ffaa00,color:#0a0a0f
style SSD fill:#ff6b1a,color:#0a0a0f
style NET fill:#ff6b1a,color:#0a0a0f
style HDD fill:#ff2e88,color:#fff
style WAN fill:#a855f7,color:#fff
Read these as ratios, not absolutes:
- RAM is ~500× faster than SSD (NVMe: 100 ns vs 50 μs).
- SSD is ~200× faster than HDD seek (50 μs vs 10 ms).
- A same-datacenter round-trip is ~5,000× slower than a memory access.
- A cross-continent round-trip is ~300× slower than a same-DC round-trip.
This is why every architecture diagram has a cache in front of the database. This is why we colocate services in the same region. This is why CDNs exist.
Latency in human terms
Scaling everything up by 10^9 (turning nanoseconds into seconds) gives you intuition that sticks:
| Operation | Real | Scaled (×10^9) |
|---|---|---|
| L1 cache | 0.5 ns | 0.5 sec |
| Main memory | 100 ns | 100 sec (~2 min) |
| SSD read | 50 μs | ~14 hours |
| Cross-DC RTT | 500 μs | ~6 days |
| HDD seek | 10 ms | ~4 months |
| Cross-continent | 150 ms | ~5 years |
When your code does an HDD seek instead of staying in cache, the human-time equivalent is "instead of finishing this sentence, take a 4-month nap." That's the cost of a cache miss.
Hardware specs to keep in mind
Modern commodity server (think r6i.4xlarge or similar memory-optimized instance):
CPU: ~16 vCPUs @ 3.5 GHz
RAM: ~128 GB
Network: up to 12.5 Gbps (r6i.4xlarge; larger instances reach 25–50 Gbps)
NVMe: available on local-storage variants (r6id); EBS: ~10 Gbps, ~40k IOPS
HDD: ~150 MB/s sequential, ~150 random IOPS
A few useful "what fits on one box" rules of thumb: a single Linux box handles roughly 50k–100k QPS of simple HTTP; a well-tuned Postgres sustains around 50k writes/sec for simple inserts on fast hardware (complex transactional workloads are more like 5–20k TPS); a Redis instance can do ~100k–1M ops/sec without pipelining, and well into the millions with pipelining — plan on ~100k–200k for realistic mixed workloads; practical RAM ceiling is ~1 TB hot working set; and network tops out around ~10 GB/s on a high-tier instance. When you exceed these, you start sharding, replicating, and caching — all the topics in the rest of this course.
The four-step recipe
Given any system design problem, do these four estimates in this order: traffic → storage → bandwidth → memory. Each answer shapes the next one, so the order matters.
flowchart TD
A[1. Traffic Estimate<br/>QPS read and write] --> B[2. Storage Estimate<br/>bytes/event x events/year]
B --> C[3. Bandwidth Estimate<br/>bytes x QPS]
C --> D[4. Memory Estimate<br/>cache size for 80/20 rule]
style A fill:#ff6b1a,color:#0a0a0f
style B fill:#ffaa00,color:#0a0a0f
style C fill:#15803d,color:#fff
style D fill:#0e7490,color:#fff
Traffic. Start with DAU, multiply by actions per user per day, divide by 86,400 (or 10^5 for napkin math) to get average QPS, then multiply by 2–3× for peak. Always compute the read:write ratio separately — most systems are heavily skewed toward reads, and the write rate is what drives your database choices.
Storage. Multiply bytes-per-event by events-per-day by your retention period. Then add ~30% overhead for indexes, multiply by ~3× for replication (replication factor 3 is the norm), and tack on 20–50% for backups and historical. The replication multiplier alone often triples your naive estimate — don't skip it.
Bandwidth. Ingress is avg_event_size × write_QPS; egress is avg_response_size × read_QPS. Don't forget heavier traffic like thumbnails or video re-encoding. Bandwidth is what tells you whether you need a CDN — once egress climbs past ~1 GB/s from origin, the bill alone justifies edge caching.
Memory. Apply the 80/20 rule: 20% of your data serves 80% of requests. Daily cache size ≈ 20% × daily read bytes. Multiply by your replication factor to get total RAM needed. If this number is small (sub-GB), you can fit the hot working set on a single Redis node. If it's in the tens of terabytes, you're looking at a distributed cache cluster.
A quick math interlude
You will be doing a lot of multiplication. Two tricks to do it fast in your head:
Trick 1: align the powers of 10. When you see 2.5B × 400 bytes, write it as 2.5 × 10^9 × 4 × 10^2 = 10 × 10^11 = 10^12 = 1 TB. You don't need to multiply 2.5 × 400.
Trick 2: the "round, then correct" pattern. Round messy numbers to the nearest power of 10 first; reason about the order of magnitude; tweak at the end if needed.
500M × 5 = 2.5B (could just say "few billion")
75k × 1KB ≈ 100 MB/s
75k × 200KB ≈ 15 GB/s
If you're off by 2× and the number is 100 GB/s, the architecture is the same. If you're off by 2× and the number is 100 KB/s vs 200 KB/s, who cares. Order of magnitude is what matters.
Worked example 1: Twitter
Design Twitter for 500M DAU. Each user posts ~5 tweets/day, reads ~50 tweets/day. Tweet is ~280 chars + metadata.
Traffic
Writes (tweets):
500M × 5 = 2.5B/day
2.5 × 10^9 / 10^5 = 25,000/sec average
Peak ≈ 75,000/sec (× 3)
Reads (timeline loads):
500M × 50 = 25B/day
25 × 10^9 / 10^5 = 250,000/sec average
Peak ≈ 750,000/sec
Read:write ≈ 10:1
What it tells you: 75k tweets/sec peak. One Postgres tops out around ~50k simple writes/sec on well-tuned hardware, so even before you count indexes and replication, this needs sharding immediately. 750k reads/sec demands caching.
Storage
Tweet payload: 280 bytes text + ~120 bytes metadata ≈ 400 bytes
Plus media (avg): ~200 KB on 20% of tweets
Per day:
Text: 2.5B × 400B = 1 TB/day
Media: 0.5B × 200KB = 100 TB/day
Per year:
Text: ~365 TB
Media: ~36 PB
Replication × 3:
Text: ~1 PB/year
Media: ~110 PB/year
Media dwarfs text by 100×. The architecture is bottlenecked by media, which means S3-class object storage and an aggressive CDN.
Bandwidth
Write ingress (text only): 25k × 400B = 10 MB/sec
Read egress (text only): 250k × 400B = 100 MB/sec
Media writes: 5k × 200KB = 1 GB/sec
Media reads: 50k × 200KB = 10 GB/sec
That media read bandwidth is why Twitter has a CDN. 10 GB/s served from origin is a $millions/month bill; from CDN edges it's a fraction.
Cache
Daily reads = 25B × 400B = 10 TB/day text
20% hot = 2 TB hot timeline data
× replication factor 3 = 6 TB total cache RAM needed
Distribute over Redis cluster: 100 nodes × 60 GB each.
That's the napkin. It tells you: ~75k tweets/sec peak means a single Postgres won't cut it — you need sharding. 100 GB/year of metadata fits comfortably in one big Postgres per shard. Media is the real cost — solve with object storage (S3) + CDN (CloudFront). 6 TB hot cache fits in a Redis cluster of 100 nodes at ~60 GB each.
Worked example 2: A URL shortener
Design TinyURL for 100M new URLs per month, 10:1 read:write ratio.
Traffic
Writes (creates):
100M / 30 / 86400 ≈ 40 writes/sec average
Peak ≈ 120 writes/sec
Reads (redirects):
10× writes ≈ 400 reads/sec average
Peak ≈ 1,200 reads/sec
Storage
5 years:
100M × 60 months = 6B URLs
Each row ~500B = 3 TB total
Easy on a single replicated Postgres.
Cache
80/20 rule: 20% of short URLs serve ~80% of redirects
Daily-active URLs ≈ 40 writes/sec × 86,400 ≈ 3.5M new/day
Hot working set ~ 20% of daily-active ≈ 700k codes ≈ 1M (rounded)
1M hot codes × 500 B = 500 MB
Fits in a single Redis instance.
The shape of this system is completely different from Twitter — and the napkin tells you why before you draw a single box. This is "1 Postgres + 1 Redis," not "100 shards and Cassandra."
Worked example 3: A chat / messaging system
Design WhatsApp-style messaging for 1B DAU, average user sends 50 messages/day.
Traffic
Messages sent: 1B × 50 = 50B/day
Per second: 50B / 10^5 = 500k/sec average
Peak: ~1.5M/sec
But every message is also delivered to (typically) 1 recipient,
so write fanout = 1× → ~1.5M delivery writes/sec at peak
(For group chats with N members, multiply by N.)
Storage
Per message: 100 bytes (text) + 50 bytes metadata = 150 bytes
Per day: 50B × 150B = 7.5 TB/day text
Per year: ~2.7 PB
Plus media (let's say 10% of messages, avg 1 MB):
5B × 1 MB = 5 PB/day media
Replication × 3:
Text: ~9 PB / year
Media: similar magnitude per year, but archived aggressively
Bandwidth
Write ingress (text): 1.5M × 150B = 225 MB/sec
Push to recipients: 1.5M × 150B = 225 MB/sec (1:1 chat)
Media: dominated by uploads/downloads — at 10% rate, ~150 GB/sec peak
Architecture implications
The numbers drive the choices directly. 1.5M messages/sec writes means you must shard the message store — Cassandra and similar wide-column stores are natural fits here. You need a presence service to distinguish online from offline recipients, so you can queue messages for users who aren't connected. Media gets pushed to S3 + CDN just like Twitter. And at this connection count you need long-lived WebSockets, which means a dedicated connection routing layer. That's the foundation; the chat system article goes deeper.
flowchart LR
TRAFFIC["1. Traffic: 1.5M msg/sec peak"] --> SHARD["Shard the message store<br/>Cassandra / wide-column"]
TRAFFIC --> PRESENCE["Presence service<br/>online vs offline"]
TRAFFIC --> WS["WebSocket layer<br/>long-lived connections"]
SHARD --> MEDIA["10% media x 1MB = 5 PB/day<br/>S3 + CDN"]
style TRAFFIC fill:#ff6b1a,color:#0a0a0f
style SHARD fill:#15803d,color:#fff
style PRESENCE fill:#0e7490,color:#fff
style WS fill:#a855f7,color:#fff
style MEDIA fill:#ffaa00,color:#0a0a0f
Sanity-checking your numbers
Once you finish a calculation, spend ten seconds checking the result makes physical sense. If your estimate says a single box needs to serve 500k writes/sec, that's a red flag — one Postgres tops out around 50k. If bandwidth comes out above 1 GB/s from origin, a CDN stops being optional. If your hot working set exceeds 1 TB, you'll need to partition the cache across multiple Redis nodes.
A rough monthly cost check: multiply egress GB/s by your effective $/GB rate — roughly $0.09/GB for S3 direct, or $0.01–0.08/GB for a metered CDN — to get $/s, then × 86,400 × 30 for monthly. When that number crosses seven figures, you're no longer making an engineering decision — you're making a business one, and routing through a CDN can cut the bill by 5–10×.
flowchart TD
RESULT[Raw calculation result] --> W{"Writes/sec > 50k?"}
W -->|yes| SHARD2[Plan to shard]
W -->|no| BW{"Bandwidth > 1 GB/s?"}
BW -->|yes| CDN2[Add CDN]
BW -->|no| CACHE{"Hot set > 1 TB?"}
CACHE -->|yes| PART[Partition cache]
CACHE -->|no| OK[Single-node design is fine]
style SHARD2 fill:#ff2e88,color:#fff
style CDN2 fill:#ff6b1a,color:#0a0a0f
style PART fill:#ffaa00,color:#0a0a0f
style OK fill:#15803d,color:#fff
Cheat sheet
1 byte ≈ 1 char
1 KB ≈ 1 paragraph of text or a tiny JSON
1 MB ≈ a small image or 1 minute of MP3
1 GB ≈ a 2-hour SD video or ~1 hour of HD
1 TB ≈ 1000 hours of HD video
1 PB ≈ 250 million photos
1 day ≈ 10^5 seconds
1 year ≈ 3 × 10^7 seconds
1 minute = 60 seconds (≈ 10^1.8)
1 Mbps ≈ 125 KB/sec
1 Gbps ≈ 125 MB/sec
10 Gbps ≈ 1.25 GB/sec
40 Gbps ≈ 5 GB/sec
100 Gbps ≈ 12.5 GB/sec
1 TB SSD ≈ $50 1 TB HDD ≈ $20 1 TB S3 ≈ $23/month
1 GB RAM ≈ $5
1 hour AWS cores ≈ $0.05–0.20/core depending on type
S3 egress ≈ $0.05–0.09/GB (tiered; $0.09 first 10 TB, $0.05 above 150 TB)
CDN egress ≈ $0.01–0.085/GB for metered CDNs (CloudFront etc.); Cloudflare CDN charges $0 on self-serve plans
1 Postgres ≈ ~50k simple inserts/sec; ~5–20k for complex txns; ~1 TB hot
1 Redis ≈ ~100k–1M ops/sec (no pipeline); much higher with pipelining; RAM-bound
1 Kafka broker ≈ ~100MB/sec ingest
Things you should now be able to answer
- A service has 100M DAU, each user makes 20 requests/day. What's the peak QPS?
- You're caching 20% hot data out of 5 TB daily reads. How big is your cache?
- Why is reading a 1 MB row across a transatlantic link ~150ms minimum?
- A "Like" event is 80 bytes; the system gets 200k likes/sec; storage cost over a year (with 3× replication, 30% overhead)?
- A single-box Postgres handles ~50k writes/sec. At what DAU × actions/user does sharding become unavoidable?
→ Next: Networking & HTTP
Frequently asked questions
▸What is the single most useful number to memorize for back-of-the-envelope estimation?
86,400 seconds per day, approximated as 10^5. It lets you convert any per-day figure to per-second instantly: 1B requests/day divided by 10^5 equals 10,000 requests/sec.
▸How much faster is RAM than NVMe SSD, and why does that ratio matter for architecture?
RAM is roughly 500x faster than NVMe SSD — 100 ns versus 50 microseconds. That gap is why every serious architecture diagram places a cache in front of the database.
▸What are the four steps of the capacity estimation recipe, and why does order matter?
The four steps are traffic, storage, bandwidth, and memory, in that order. Each answer feeds into the next: traffic drives storage size, storage and traffic together determine bandwidth, and bandwidth determines how large your cache working set needs to be.
▸At what write rate does a single Postgres instance become a bottleneck, and what does that mean for Twitter-scale systems?
A well-tuned Postgres handles roughly 50,000 simple inserts per second on fast hardware; complex transactional workloads are closer to 5,000-20,000 TPS. At Twitter scale, peak tweet writes hit 75,000 per second, so sharding is unavoidable before you even account for indexes and replication.
▸How do the capacity numbers for a URL shortener compare to those for Twitter, and what does that imply architecturally?
TinyURL at 100M URLs per month generates about 40 writes/sec and 400 reads/sec, with a total 5-year dataset of 3 TB and a hot working set of just 500 MB. That fits on a single replicated Postgres and a single Redis node — a completely different architecture from Twitter's 100 shards and distributed cache cluster.