~/articles/design-url-shortener
Beginnerasked at Googleasked at Amazonasked at Microsoft

Design a URL Shortener (TinyURL / bit.ly)

A classic FAANG warmup. Generate short codes, store them, redirect fast, scale to billions of URLs.

13 min read2026-02-05Ironclad Academy
// DEPTH
the full breakdown — requirements, capacity, evolution, trade-offs

The problem

bit.ly and TinyURL turn a long, unwieldy URL into a handful of characters — https://bit.ly/3xQpR7 instead of a 200-character Google Docs link. You paste the short code into a tweet, an email, or a billboard, and when someone clicks it, they land on the original page within milliseconds. The whole interaction takes under 100ms, and the mapping has to hold up years later when someone digs a link out of their inbox.

Underneath that simplicity is a classic engineering trade-off: the create path is rare (maybe 40 writes per second), while the redirect path is relentless (thousands of reads per second, spiking to tens of thousands when a link goes viral). That 100:1 or 1000:1 read-to-write ratio means the read path must be radically cheaper than a round-trip to the database. A single lost mapping breaks a link permanently — there is no "link not found, try again later." This combination of extreme read skew, zero-loss durability, and sub-100ms global latency is what makes the URL shortener a staple of system design interviews.

The two hardest sub-problems are code generation (how do you mint millions of unique 6-character codes without coordination overhead or collision risk?) and caching (how do you keep the most popular codes available at the edge so the database never sees the load?). Get those right, and the rest of the design follows.

Functional requirements

  • POST /shorten { long_url } → returns a short code (e.g. https://sysf.ge/aZx4Q3).
  • GET /:code → 302 redirect to the long URL (typically 302; see trade-offs below).
  • (Optional) Custom aliases: /shorten { long_url, alias: "promo" }.
  • (Optional) Analytics: click counts per short URL.
  • (Optional) Expiration: short URLs that expire after N days.

Non-functional requirements

  • Read-heavy: redirects vastly outnumber creations. Often 100:1 or 1000:1.
  • Low redirect latency — < 100ms p99 globally.
  • High availability — a 503 from a URL shortener breaks every link in every email.
  • Durable — losing a short→long mapping breaks the internet.

Capacity estimation

DimensionEstimateHow we got there
Write rate (avg)~40 writes/s100M ÷ (30 × 86,400)
Write rate (peak)~120 writes/s~3× average burst
Read rate (avg)~4,000 reads/s100:1 read-to-write ratio
Read rate (peak)~12,000 reads/s~3× average burst
Row size~500 byteslong_url + short_code + metadata
URL corpus (5 yr)6B URLs5 years × 100M/month × 12 months
Storage~3 TB6B × 500B
Read bandwidth~2 MB/s4k req/s × 500B — tiny
Hot working set~1 GB~2M hot codes × 500B; Zipfian distribution means a few million codes serve most clicks

Takeaway: 3 TB and 12k QPS is well within reach of a sharded Postgres or DynamoDB.

Building up to the design

Jumping straight to "sharded Postgres + Redis + CDN + Kafka" sounds confident but skips the reasoning. In an interview (and in real life), the better move is to start with the simplest thing that could possibly work and grow it as each layer breaks. Each step below is a complete, runnable system — and each one fails for a specific, nameable reason.

V1: A Python dict on one box

urls = {}                              # in-memory map

def shorten(long_url):
    code = random_base62(6)
    urls[code] = long_url
    return code

def lookup(code):
    return urls.get(code)

One Flask process, one machine, no database. Handles a few thousand requests/sec from your laptop and takes you all the way to a working demo. The problem is simple: restart the process and every URL you ever shortened is gone. Anything past "demo" needs durability.

V2: Add Postgres

Replace the dict with a urls(code, long_url) table. The code is now ~30 lines including connection setup. You get durability, multi-process scaling, and real query support — who created what, when. But every redirect is a DB query. At ~4k reads/sec, your single Postgres is fine on CPU, but a cold lookup is 5–20ms. That stacks badly when you're targeting sub-100ms p99 globally with every hop counted.

V3: Put Redis in front

GET code → Redis hit?  → return long_url
                ↓ miss
           Postgres → populate Redis → return

Redis holds ~2M hot codes. Because the working set is Zipfian, more than 99% of reads never touch Postgres — p50 drops to ~1ms and DB load falls by 100×. The new weak point is availability: still a single API box. One unlucky deploy or kernel panic, and every short URL in existence goes dark, including ones embedded in years-old emails.

V4: Two API boxes behind a load balancer

Same DB, same Redis, but now 2+ stateless API servers behind an ALB or HAProxy, with a read replica for DB reads. This removes the API SPOF and handles 10k+ QPS comfortably. What surfaces next is ID generation: if every server calls random_base62(6) independently, you'll eventually hit collisions and have to retry on a UNIQUE violation. Manageable, but the failure mode is "one user's POST /shorten randomly takes 200ms because you hit 3 collisions in a row."

V5: A dedicated ID generator + sharded storage

This is the production design. Each shard pre-allocates a counter range — shard 0 owns IDs 0..1M, shard 1 owns 1M..2M, and so on — encodes to base62, and inserts locally. Zero coordination, zero collisions on the hot path. Storage is split across N Postgres instances by hash(code).

This is where the full architecture diagram picks up. Everything after this section is naming the pieces and sharpening the trade-offs.

V6 (optional, for "scale 10×" follow-ups)

Put a CDN in front of the API so most redirects are served at the edge and never reach origin. Move click analytics into Kafka so the redirect path stays 1ms regardless of how hot a code is. Add multi-region async replication — a brand-new code is fine to lag by 1–2 seconds globally, because the user shares the link before anyone else can click it.

flowchart LR
    V1["V1: dict in memory<br/>~1k QPS, no durability"] --> V2["V2: + Postgres<br/>durable, slow reads"]
    V2 --> V3["V3: + Redis<br/>1ms reads, single SPOF"]
    V3 --> V4["V4: + LB, replicas<br/>10k QPS, write bottleneck"]
    V4 --> V5["V5: sharded DB + ID gen<br/>100k+ QPS"]
    V5 --> V6["V6: + CDN + Kafka + multi-region<br/>global scale"]
    style V1 fill:#0e7490,color:#fff
    style V3 fill:#15803d,color:#fff
    style V5 fill:#ff6b1a,color:#0a0a0f
    style V6 fill:#a855f7,color:#fff

The rest of this article zooms in on V5 — the design you'd actually run in production.

API design

POST /api/v1/shorten HTTP/1.1
Authorization: Bearer ...
Content-Type: application/json

{ "long_url": "https://example.com/some/very/long/path",
  "alias": "promo",          // optional
  "expires_in_days": 30 }     // optional

→ 201 Created
{ "short_url": "https://sysf.ge/promo",
  "code": "promo",
  "expires_at": "2026-03-07T..." }
GET /:code  →  302 Found
Location: https://example.com/some/very/long/path
Cache-Control: private, max-age=0
Surrogate-Control: max-age=86400

Cache-Control: private tells browsers not to cache the redirect (so every click is observable for analytics). CDNs honour Surrogate-Control (or an equivalent CDN-specific header such as CDN-Cache-Control) to cache the response at the edge for up to a day — giving the 90%+ CDN hit rate in the caching table — without storing the redirect in the browser's local cache. (See the 301 vs 302 trade-off below for why most production shorteners prefer 302.)

Generating the short code

The question underneath code generation is: how do you produce a short, unique string without either (a) a global coordination bottleneck, or (b) relying on luck and retries? There are three realistic approaches.

Hash (e.g. base62 of MD5)

hash = md5(long_url + salt)
code = base62(hash[:6])    # first 6 chars

Stateless and simple — no shared state at all. The catch is collisions: two different long URLs can hash to the same 6-char prefix, so you need a retry loop with a different salt. For small scale this is fine. At billions of URLs it becomes a slow, annoying edge case.

Counter + base62

A central counter generates increasing IDs (1, 2, 3, …). Convert to base62.

ID 125code "21"     (2 chars)
ID 1Mcode "4c92"   (4 chars)
ID 56Bcode "Z7Qias" (6 chars: 62^656B)

No collisions ever. The problem is that a truly global counter is a write bottleneck — it's a single point of serialization. The fix is shard-local counter ranges: shard 0 claims IDs 0..1M, shard 1 claims 1M..2M, and so on. When a shard exhausts its range, it claims another block. The claim operation is a simple atomic increment on a counter service and happens at most once per million inserts — essentially free.

flowchart LR
    CS[Counter Service<br/>global atomic counter] -->|"claims range 0..1M"| S0[Shard 0<br/>inserts locally]
    CS -->|"claims range 1M..2M"| S1[Shard 1<br/>inserts locally]
    CS -->|"claims range 2M..3M"| S2[Shard 2<br/>inserts locally]
    S0 -->|"encode to base62"| C0["code: '0'...'4c91'"]
    S1 -->|"encode to base62"| C1["code: '4c92'...'8oi4'"]
    style CS fill:#ff6b1a,color:#0a0a0f
    style S0 fill:#15803d,color:#fff
    style S1 fill:#15803d,color:#fff
    style S2 fill:#15803d,color:#fff

This is the recommended approach — no collisions, no per-request coordination, and the base62 encoding keeps codes short.

Pre-generated pool

Generate 100M random codes ahead of time, store them in an "unused codes" table. Each POST /shorten claims one and moves it to a "used" table. No collisions, fast claims. The downside is operational overhead: you need to regenerate the pool before it empties, and the atomic claim across two tables adds a step. Workable, but counter + base62 with sharded ranges is simpler to reason about.

Schema

A single Postgres table is plenty until you're at billions of rows.

CREATE TABLE urls (
  code        VARCHAR(8)  PRIMARY KEY,
  long_url    TEXT        NOT NULL,
  user_id     BIGINT,
  created_at  TIMESTAMPTZ DEFAULT now(),
  expires_at  TIMESTAMPTZ,
  click_count BIGINT      DEFAULT 0
);

CREATE INDEX urls_user_id ON urls(user_id);
CREATE INDEX urls_expires ON urls(expires_at) WHERE expires_at IS NOT NULL;

Past 1 TB, shard by code (consistent hashing). Each shard is a Postgres instance.

Architecture

flowchart TD
    U[User] --> CDN[CDN]
    CDN --> LB[Load Balancer]
    LB --> API[Shortener API]
    API --> SVC{Operation}

    SVC -->|"shorten"| IDGEN[ID Generator<br/>per-shard ranges]
    IDGEN --> DB[(Sharded URLs DB)]

    SVC -->|"redirect"| CACHE[(Redis<br/>code → long_url)]
    CACHE -. miss .-> DB

    DB --> CACHE
    API --> KAFKA[Kafka:<br/>click events]
    KAFKA --> COUNT[Click Counter]
    COUNT --> ANA[(Analytics DB)]

    style CDN fill:#ff6b1a,color:#0a0a0f
    style CACHE fill:#15803d,color:#fff
    style DB fill:#0e7490,color:#fff
    style KAFKA fill:#a855f7,color:#fff

The redirect path (the hot path) flows like this: a browser hits /aZx4Q3, the CDN checks its own cache, and if it has the answer it returns the redirect without contacting origin. On a CDN miss, the API checks Redis. On a Redis miss, it queries the sharded DB, writes the result back to Redis, and responds. Most redirects short-circuit at the CDN or Redis layer and never touch the database.

sequenceDiagram
    participant Browser
    participant CDN
    participant API as Shortener API
    participant Redis
    participant DB as Sharded DB
    Browser->>CDN: GET /aZx4Q3
    alt CDN hit
        CDN-->>Browser: 302 (cached)
    else CDN miss
        CDN->>API: GET /aZx4Q3
        API->>Redis: GET aZx4Q3
        alt Redis hit
            Redis-->>API: long_url
            API-->>Browser: 302 Found
        else Redis miss
            API->>DB: SELECT long_url WHERE code='aZx4Q3'
            DB-->>API: long_url
            API->>Redis: SET aZx4Q3 long_url EX 86400
            API-->>Browser: 302 Found
        end
    end

The write path is quieter: POST /shorten arrives, the ID generator picks a unique ID in the shard's pre-allocated range, encodes it to base62, and inserts into the right DB shard. The response goes back to the client. No cache warming on write — reads populate Redis lazily on the first redirect.

Click analytics

We do not want to update click_count synchronously on every redirect — that turns a 1ms read into a write-contention nightmare on hot URLs. A code that gets tweeted by a celebrity might see 10,000 clicks per second; serializing that through a counter column in Postgres is a recipe for lock contention.

The fix is fire-and-forget into Kafka:

sequenceDiagram
    participant U as User
    participant API as Redirect API
    participant K as Kafka
    participant W as Counter Worker
    participant DB as Analytics DB
    U->>API: GET /aZx4Q3
    API->>U: 302 Redirect
    API->>K: emit ClickEvent
    K->>W: deliver
    W->>DB: increment counter (batched)

Workers batch updates — flush every second per code — and write to an analytics-friendly store like ClickHouse, BigQuery, or Cassandra. The redirect path itself never blocks on a write.

Caching strategy

URLs are massively read-heavy, so cache at every layer you can:

LayerTTLHit rate goal
CDN1 day90% (most popular)
Redis1 day, LRU99% of DB reads
DB buffer pool(managed)(rest)

When a code expires, you have two options. The lazy approach: redirect anyway, check expiry on the DB hit, and evict from cache at that point. The eager approach: subscribe to a CDC stream from the DB and invalidate proactively. Lazy is simpler and usually fine — an expired link being redirected for a few extra seconds is rarely a hard business requirement.

Custom aliases

For aliases ("/promo"), do an explicit INSERT ... ON CONFLICT DO NOTHING. If another row already owns "promo", return 409. To prevent squatting, restrict aliases to authenticated users or charge for them.

Trade-offs worth discussing

Same long URL → same short code?

There are two reasonable readings. Deduplication — always return the same code for the same long URL — saves storage but means the first user to shorten a link effectively "owns" its analytics. Always-new — each POST returns a fresh code — is simpler and lets you track different campaigns separately. Most production shorteners (bit.ly included) do always-new for exactly that reason.

What if the long URL is malicious?

Check against the Google Safe Browsing API on creation. Block known phishing and malware URLs, and re-check periodically — a URL that was clean on day one can become malicious later.

301 vs 302?

A 301 (permanent redirect) lets browsers and CDNs cache the mapping aggressively. Faster. But once a browser has cached it, that click never touches your servers again, so analytics go dark. A 302 (temporary) means every click hits origin, which keeps analytics accurate at the cost of a small latency overhead. Most production shorteners use 302 for that reason. If you genuinely don't care about per-click data, 301 is fine.

Rate limiting

Without limits, abusers will create millions of short URLs as spam infrastructure. Enforce per-API-key and per-IP limits at the gateway. See the rate limiter article for the algorithms.

Stretch — multi-region

For global low-latency redirects, replicate the URLs DB asynchronously across regions. The CDN does most of the heavy lifting — only write traffic actually crosses regions. A brand-new code might not be visible in a far region for 1–2 seconds, and that's fine for this use case: the person sharing a link sends it before anyone else can click it.

What interviewers look for

  • Did you do the back-of-the-envelope? (Surprisingly common to skip.)
  • Did you separate read and write paths? (Most candidates don't.)
  • Did you choose collision-free code generation, and explain how to scale the ID issuer?
  • Did you address the analytics problem without killing redirect latency?
  • Did you talk about caching at multiple layers?

Bonus discussion topics

  • "How would you handle 1B URLs/day instead of 100M/month?" — sharding, multi-region, async writes.
  • "What if some URLs expire?" — TTL on cache, soft delete, background sweeper.
  • "How would you prevent abuse?" — rate limit, CAPTCHA, malicious URL detection.
  • "How would you do A/B test redirects?"code → list of long_urls with weights, server picks at redirect time.
// FAQ

Frequently asked questions

Why use a 302 redirect instead of a 301 for URL shorteners?

A 301 (permanent redirect) allows browsers and CDNs to cache the mapping aggressively, so subsequent clicks never reach origin servers and analytics go dark. A 302 (temporary redirect) ensures every click hits origin, keeping per-click analytics accurate. Most production shorteners use 302 for this reason.

How does counter plus base62 encoding eliminate collisions without a global bottleneck?

Each shard pre-allocates a counter range from a central counter service — shard 0 owns IDs 0 to 1M, shard 1 owns 1M to 2M, and so on. Inserts happen locally within the shard with no cross-shard coordination. The claim operation, a simple atomic increment on the counter service, occurs at most once per million inserts, making it essentially free on the hot path.

What storage capacity does a URL shortener need for 6 billion URLs over 5 years?

At roughly 500 bytes per row (long URL plus short code plus metadata), 6 billion URLs requires approximately 3 TB. The article notes this fits comfortably on a sharded Postgres or DynamoDB deployment.

How should click analytics be handled without degrading redirect latency?

Rather than updating a click_count column synchronously on each redirect — which causes lock contention on hot codes that might see 10,000 clicks per second — the API fires click events to Kafka in a fire-and-forget manner. Counter workers consume those events, batch updates per second per code, and write to an analytics store like ClickHouse or BigQuery. The redirect path itself never blocks on a write, keeping p50 latency around 1ms.

What read-to-write ratio should you design for, and how does caching exploit it?

URL shorteners are 100:1 to 1000:1 read-heavy. The hot working set follows a Zipfian distribution: roughly 2 million hot codes at 500 bytes each fit in approximately 1 GB, which sits on a single Redis node. With Redis in front of the database, more than 99% of redirects never touch Postgres, dropping p50 latency to around 1ms and DB load by 100x.

// RELATED

You may also like