~/articles/design-distributed-counter

◆◆Intermediateasked at Metaasked at YouTubeasked at Twitter

Design a Distributed Counter (view / like counts)

Q: What is the hot-row problem in distributed counters?

A single UPDATE counter = counter + 1 WHERE id = ? forces every concurrent increment to acquire the same row-level lock, serializing all writes. Under viral load — for example a single item absorbing 100k+ increments per second — writes queue behind that lock, latency climbs to seconds, and the database melts. Sharded counters and write batching both exist specifically to eliminate this bottleneck.

Q: When should you model likes as edges instead of raw increments?

Use an idempotent edge store — a (user_id, item_id) composite primary key with INSERT ON CONFLICT DO NOTHING — any time at-most-once semantics per user matter and the count must not go backward. Raw incrementing, even with batching, cannot prevent double-counting across retries or network replays, whereas the edge table makes the like count the cardinality of a unique set, which is naturally exact.

Q: What error rate does Redis HyperLogLog give for unique-viewer counting, and how much memory does it use?

Redis's HyperLogLog implementation, which uses 2^14 = 16,384 registers, provides a standard error of about 0.81% and requires at most 12 KB of memory per counter regardless of set size. It uses a sparse representation at low cardinalities and transitions to the fixed 12 KB dense structure as cardinality grows, making it practical even for items with billions of viewers.

Q: How much does write batching reduce database writes for a viral item?

Buffering increments in Redis with INCR and flushing aggregated deltas to Cassandra every second collapses a high-write item to at most one DB write per shard per second. A video receiving 10,000 views in one second sends exactly one DB write per flush interval instead of 10,000 — a 10,000× reduction for that item. Platform-wide, the 2M raw increments per second collapse to at most 500k flushes per second — a 4× reduction overall, with individual hot items seeing 100 to 1,000× or more depending on their write rate.

Q: What is the durability risk of the Redis write buffer, and how do you mitigate it?

If the Redis node crashes before the flusher runs, up to one second of buffered increments are lost. For view counts this loss is industry-acceptable and the simplest choice is to accept it. Stronger options are enabling Redis AOF persistence with fsync every second, or publishing every event to Kafka as a durable log before buffering, which eliminates the loss window at the cost of added complexity.

Count likes and views at millions of increments per second without a single hot row melting. Sharded counters, write batching, and approximate vs exact counts.

19 min read2026-05-15Ironclad Academy

#interview #scale #consistency #caching

// DEPTH

the full breakdown — requirements, capacity, evolution, trade-offs

The problem

YouTube reports "2.3M views" on a video within minutes of it going viral. Facebook shows "47k likes" on a post — updated in near-real-time as people react. Twitter shows retweet counts climbing by the second during a breaking news event. Behind all of these is the same deceptively simple operation: take a number, add one, store it, read it back fast.

The naive implementation is a single database row with an integer column: UPDATE items SET view_count = view_count + 1 WHERE id = ?. This works perfectly at low traffic — a moderately popular video, a few hundred concurrent viewers. The moment a video goes viral and 100,000 users hit play in the same second, that one row becomes a hot spot. Every increment tries to acquire the same row-level lock. Writes serialize. Latency climbs into seconds. The database melts.

That hot-row problem is the central engineering challenge, but it's not the only one. Likes and views have fundamentally different semantics. A like must be at-most-once per user — a user can't like the same post twice — and users notice if the count goes backward. A view count can be approximate and is best-effort: losing a few increments in a crash is acceptable; serializing every write is not. Choosing one counter architecture for both types gets you either the wrong correctness guarantees or the wrong throughput. The interesting design work is recognizing that difference and routing each type to the right mechanism.

The full solution layers three techniques: sharding the counter across N rows to eliminate the hot-row bottleneck, buffering writes in Redis and flushing aggregated deltas to the database every second to reduce DB write volume by 100–1,000×, and storing likes as idempotent (user, item) edges rather than raw increments. This is excellent interview territory because it forces candidates to reason about write throughput, consistency levels, idempotency, and failure modes — all at once.

Functional requirements

POST /counter/{item_id}/increment — record one view / reaction event for an item.
GET /counter/{item_id} → returns the current count (or approximate count) for the item.
Likes/reactions are at-most-once per user: a user cannot like the same item twice.
Views are best-effort: each play increments the count, no per-user deduplication required for the raw count (though unique-viewer cardinality is a separate metric).
Count must not decrease due to a normal failure (no negative partial updates).

Non-functional requirements

Very high write throughput: platform-wide 2M+ increments/sec; single hot item up to 100k+ increments/sec.
Low read latency: p99 < 50ms for "what is the current count?"
Eventual consistency: displayed count may lag by seconds — not minutes.
Monotonicity: once a count reaches N, it must not drop below N (under normal operation).
High availability: counter reads and writes should survive single-node failures.

Capacity estimation

Dimension	Estimate	How we got there
Avg write rate (platform-wide)	~23k increments/sec	`2B ÷ 86,400 s`
Peak write rate (platform-wide)	~2M increments/sec	Viral spikes are ~100× average
Single viral item peak	~100k increments/sec	e.g., a live-streamed event
Active items at any moment	~500k	Out of 500M total videos/posts on the platform
Counter reads (peak)	~20M read QPS	Read-to-write ≈ 10:1; virtually all served from cache
Raw counter storage	4 GB	`500M items × 8 bytes (int64)`
Sharded counter storage (N=100)	400 GB	`500M × 100 shards × 8 bytes`; manageable in Cassandra
DB write rate after batching	≤500k writes/sec (platform)	Flush every 1s → at most 1 write per active item per second; 4× reduction at minimum (down from 2M raw events/sec), 100–1,000× for hot items where hundreds of events collapse to one flush

Takeaway: The platform write load is dominated by a relatively small number of viral items. Sharding across 100 rows per item combined with a 1-second Redis flush window keeps Cassandra write pressure at or below 500k writes/sec — well within a multi-node cluster's capacity — while tolerating at most 1 second of counter lag.

Building up to the design

The distributed counter is deceptively simple. Every layer below breaks for a specific, nameable reason. Walking that path in an interview shows you understand the actual failure mode at each scale.

V1: One row, one increment

UPDATE items SET view_count = view_count + 1 WHERE id = ?;

Simple. Works fine at low QPS — a single Postgres row handles thousands of increments/sec. Each UPDATE acquires a row-level lock, increments, and releases.

This works at neighborhood scale. The failure mode is fundamental though: every increment serializes on that one row lock. Under heavy contention, Postgres throughput on a single hot row typically reaches a few thousand updates/sec in production conditions — the ceiling varies by hardware, WAL fsync settings, and index overhead, but the bottleneck is serialized durable commits and lock contention, not CPU or I/O. Turning off synchronous_commit helps at the margins (typically 10–20%, not orders of magnitude) and does not address the lock-serialization problem under contention. A viral video receiving 100k+ increments/sec will queue everything behind that lock — writes stack up, latency climbs to seconds, and the database melts. This is the hot-row problem.

V2: Sharded (striped) counters

The fix is to stop fighting over one row. Instead of one counter, maintain N sub-counters — one per shard. Each increment picks a random (or hash-assigned) shard and updates only that shard's row. Reads SUM all N shards.

Write:  shard = random(0, N-1)
        UPDATE counter_shards SET n = n + 1
        WHERE item_id = ? AND shard_id = shard;

Read:   SELECT SUM(n) FROM counter_shards WHERE item_id = ?;

With N = 100 shards, each shard receives ~1/100th of the write load. A 100k increment/sec item now spreads 1k writes/sec across 100 rows — each row well within a single DB's hot-row threshold. Write throughput scales linearly with N; the hot spot is gone.

flowchart LR
    INC[Increment request] --> RNG["random shard 0..N-1"]
    RNG --> S0["shard 0\ncounter row"]
    RNG --> S1["shard 1\ncounter row"]
    RNG --> SN["shard N-1\ncounter row"]
    S0 --> SUM["SUM all shards\n(read path)"]
    S1 --> SUM
    SN --> SUM
    SUM --> CNT["Total count"]
    style INC fill:#ff6b1a,color:#0a0a0f
    style SUM fill:#15803d,color:#fff
    style CNT fill:#a855f7,color:#fff

The read is now a scatter-gather across N rows (or N partitions). For 100 shards that's a small SELECT SUM — still fast. The real problem is that we're still doing one DB write per increment. At 2M events/sec platform-wide, that's 2M DB writes/sec, which saturates even a well-tuned Cassandra cluster.

V3: Write batching with a buffer tier

Once two servers share nothing, their buffered counts stay local — but here we want them to share one buffer so we can collapse many increments into one DB flush. The solution is to buffer in Redis and flush periodically.

Increment:
  INCR counter:buffer:{item_id}     ← atomic Redis increment, sub-millisecond

Flusher (every 1s):
  delta = GETDEL counter:buffer:{item_id}   ← atomic GET+DEL; requires Redis 6.2+
                                              (on older Redis, use a Lua script for atomicity)
  if delta > 0:
      UPDATE counter_shards SET n = n + delta
      WHERE item_id = ? AND shard_id = hash(item_id) % N;

A video receiving 10,000 views in one second sends exactly 1 DB write per shard per flush interval — not 10,000. Platform-wide, 2M events/sec collapses to at most 500k flushes/sec (one per active item per second), and in practice far fewer because most items have low write rates and batch into a single delta.

Redis handles millions of INCR ops/sec comfortably; the DB stays quiet. The cost is durability: if the Redis node crashes before flushing, you lose up to 1 second of increments. For view counts, losing roughly 1s of data is acceptable. For financial or like counts, you need a stronger durability story — which leads to the next layer.

V4: Exact likes — idempotent edge storage

Likes are categorically different from views. A user should only be able to like once — this is at-most-once per (user, item) pair. The count must be exact: users notice "5 likes" going to "4 likes" or getting a like they didn't give. Raw incrementing, even with the buffer tier, doesn't enforce uniqueness.

The correct model: store the (user, item) edge, not a raw counter.

-- Postgres or DynamoDB
CREATE TABLE likes (
  user_id  BIGINT NOT NULL,
  item_id  BIGINT NOT NULL,
  liked_at TIMESTAMPTZ DEFAULT now(),
  PRIMARY KEY (user_id, item_id)   -- composite PK enforces uniqueness
);

-- Like count = cardinality of the set:
SELECT COUNT(*) FROM likes WHERE item_id = ?;

INSERT INTO likes ... ON CONFLICT DO NOTHING is idempotent: double-tapping "like" is safe. The count is always the true cardinality of the edge set — it cannot be artificially inflated by retries, network replays, or duplicate events.

Counting by SELECT COUNT(*) at read time is expensive for high-like items. The fix is to maintain a denormalized cached count in a separate table, updated via a trigger or event stream — but the source of truth is still the edge set. Views have a weaker uniqueness requirement: you might want to count unique viewers as a separate metric, not deduplicate every view. For that, exact set storage (one row per user per video) is too expensive at YouTube scale.

V5: Unique-view cardinality — HyperLogLog

If you want "distinct viewers" without storing every (user, video) pair, use HyperLogLog (HLL). HLL is a probabilistic cardinality estimator. Redis's implementation (which uses 2^14 = 16,384 registers) provides a standard error of ~0.81% and requires at most 12 KB of memory regardless of set size, using a sparse representation at low cardinalities that transitions to the fixed 12 KB dense structure as cardinality grows.

ADD user_id to HLL for item_id:
  PFADD unique_viewers:{item_id}  user_id

Count distinct viewers:
  PFCOUNT unique_viewers:{item_id}

Redis natively supports HyperLogLog with PFADD and PFCOUNT. For very large items (billions of users), HLLs can be merged across shards.

Related: Bloom filters serve a similar role for membership testing — "has this user viewed this video before?" — and can gate increments before they reach the counter pipeline.

flowchart LR
    V1["V1: single UPDATE<br/>hot-row bottleneck"] --> V2["V2: sharded counter<br/>N rows, N× throughput"]
    V2 --> V3["V3: + Redis buffer<br/>100–1000× write reduction"]
    V3 --> V4["V4: likes as edges<br/>idempotent, exact"]
    V4 --> V5["V5: + HyperLogLog<br/>unique viewers, 12 KB"]
    style V1 fill:#0e7490,color:#fff
    style V2 fill:#15803d,color:#fff
    style V3 fill:#ff6b1a,color:#0a0a0f
    style V5 fill:#a855f7,color:#fff

High-level architecture

flowchart TD
    CLIENT[Client] --> GW[API Gateway / Load Balancer]

    GW -->|"view event"| VIEW[View Counter Service]
    GW -->|"like / unlike"| LIKE[Like Service]
    GW -->|"GET count"| READ[Read Service]

    VIEW --> RBUF[(Redis<br/>write buffer<br/>INCR per item)]
    RBUF -.flush every 1s.-> FLUSH[Flusher Workers]
    FLUSH --> CASS[(Cassandra<br/>sharded counter rows)]

    LIKE --> PG[(Postgres / DynamoDB<br/>likes edge table)]
    PG -.count.-> LCACHE[(Redis<br/>like count cache)]

    READ --> RCACHE[(Redis<br/>read cache)]
    RCACHE -.miss.-> CASS
    CASS -.aggregate.-> RCACHE

    style GW fill:#ff6b1a,color:#0a0a0f
    style RBUF fill:#15803d,color:#fff
    style CASS fill:#0e7490,color:#fff
    style PG fill:#0e7490,color:#fff
    style RCACHE fill:#a855f7,color:#fff
    style FLUSH fill:#ffaa00,color:#0a0a0f

Write path — views

Client fires POST /view/{item_id}. The View Counter Service runs INCR counter:buf:{item_id} in Redis — atomic, sub-millisecond, no DB touch.
A flusher worker runs every ~1 second, picks items with a non-zero buffer, reads and resets the delta atomically with GETDEL (Redis 6.2+; on older versions, use a Lua script that executes GET+DEL atomically), and writes the delta to Cassandra via UPDATE counter_shards SET n = n + ? WHERE item_id = ? AND shard_id = ?.
Cassandra row is chosen by shard_id = hash(item_id) % N (N = 100 for high-write items, configurable). The update uses Cassandra's COUNTER column type, which handles concurrent increments from multiple replicas at the partition level. Note: Cassandra counter tables do not support lightweight transactions (IF conditions / CAS) — the counter's conflict-resolution semantics make CAS redundant and it is explicitly unsupported.

Write path — likes

Client fires POST /like/{item_id} with authenticated user_id.
Like Service issues INSERT INTO likes (user_id, item_id) ON CONFLICT DO NOTHING. Returns 200 whether or not the like was new (idempotent to the caller).
A change-data-capture (CDC) stream or a DB trigger increments a cached like count in Redis. The cached count is the fast-read answer; the edge table is the source of truth.

sequenceDiagram
    participant U as User
    participant LS as Like Service
    participant PG as Postgres "likes" table
    participant CDC as CDC / trigger
    participant RC as Redis like-count cache

    U->>LS: POST /like/item_42
    LS->>PG: INSERT (user_id, item_42) ON CONFLICT DO NOTHING
    PG-->>LS: OK (new or duplicate)
    LS-->>U: 200 OK

    Note over CDC: async, near-real-time
    CDC->>RC: INCR like_count:item_42
    RC-->>CDC: new cached total

Read path

GET /counter/{item_id} hits the Read Service.
Redis cache lookup. Hit → return cached sum directly. Miss → query Cassandra with SELECT SUM(n) FROM counter_shards WHERE item_id = ?, write result back to Redis with a short TTL (5–30s), return.
For likes, same pattern but sourced from the like-count Redis key (backed by the edge table).

Sharded counter deep-dive

The number of shards N is a key tuning parameter:

N (shards)	Max hot-item writes/sec	Read cost (SUM)	When to use
1	~5k–20k	1 row	Low-traffic items, default
10	~50k–200k	10 rows	Moderate popularity
100	~500k–2M	100 rows	Viral items
1000	~5M–20M	1000 rows	Extreme edge cases

In practice, counters are assigned N dynamically based on recent write rate. A lightweight "hot item detector" (see design-top-k-heavy-hitters) identifies items that exceed a threshold and upgrades their shard count. Shard count changes require a brief migration window (split existing rows, update routing).

For the read SELECT SUM(n) FROM counter_shards WHERE item_id = ?, N = 100 means reading 100 rows. In Cassandra, those rows are in the same partition (item_id is the partition key, shard_id is the clustering key), so the full aggregate is a single partition scan — fast regardless of N.

-- Cassandra schema (CQL)
CREATE TABLE counter_shards (
  item_id   BIGINT,
  shard_id  INT,
  n         COUNTER,
  PRIMARY KEY (item_id, shard_id)
);

-- Increment shard 37 for item 12345
UPDATE counter_shards SET n = n + 1
WHERE item_id = 12345 AND shard_id = 37;

-- Read total
SELECT SUM(n) FROM counter_shards WHERE item_id = 12345;

Cassandra's COUNTER column type is inspired by CRDT research: each coordinator node tracks its own contribution independently, and replicas reconcile by summing those per-node delta contributions — so no acknowledged increment is lost during a replica divergence and reconciliation. The implementation is not a last-write-wins operation; it is a version-tracked, per-node-delta scheme where each node's tally is summed, not overwritten. Be aware of practical caveats: counter tables cannot be mixed with non-counter columns, counter deletes are unsupported, counter updates are not idempotent (retried writes can double-count), and the implementation has historically had reliability issues under high load. For read-heavy workloads, the displayed count is eventually consistent across replicas.

Exact vs approximate — the trade-off table

The most common interview mistake is conflating all counter types and picking one approach for everything. The right answer is "it depends on the semantics" — and then naming the semantic clearly.

Count type	Semantics	Acceptable error	Recommended approach
Video view count	"2.3M views" display	Seconds of lag, ~1% imprecision	Write buffer → sharded counter
Like count	Social signal, user-visible	Should not go backward; near-exact	Idempotent edge store + cached count
Unique viewer count	Analytics / advertisers	~0.81% (HLL error for Redis implementation)	HyperLogLog
Story reaction count	Ephemeral, short-lived	Approximate fine	Redis counter (no DB flush needed for short TTL)
Revenue-affecting metric	Billing, ads impressions	Exact required	Transactional store with deduplication

flowchart TD
    Q{"What are you counting?"}
    Q -->|"likes / reactions\n(user-visible, at-most-once)"| EDGE["Edge store\n(user_id, item_id) PRIMARY KEY\nON CONFLICT DO NOTHING"]
    Q -->|"views / plays\n(best-effort, high volume)"| BUF["Redis buffer → sharded\nCassandra counter"]
    Q -->|"unique viewers\n(distinct cardinality)"| HLL["HyperLogLog\nPFADD / PFCOUNT\n~0.81% error, 12 KB"]
    Q -->|"billing / ad impressions\n(exact required)"| TXN["Transactional store\nwith deduplication"]
    style Q fill:#ff6b1a,color:#0a0a0f
    style EDGE fill:#15803d,color:#fff
    style BUF fill:#0e7490,color:#fff
    style HLL fill:#a855f7,color:#fff
    style TXN fill:#ffaa00,color:#0a0a0f

Sequence diagram: view increment + flush

sequenceDiagram
    participant C as Client
    participant API as View API
    participant R as Redis buffer
    participant F as Flusher
    participant DB as Cassandra
    participant RC as Read cache

    C->>API: POST /view/item_42
    API->>R: INCR counter:buf:item_42
    R-->>API: value = 1 (ack)
    API-->>C: 200 OK

    Note over F: every ~1 second
    F->>R: GETDEL counter:buf:item_42
    R-->>F: delta = 847
    F->>DB: UPDATE counter_shards SET n = n + 847 WHERE item_id=42 AND shard_id=7
    DB-->>F: OK
    F->>RC: DEL cached_count:item_42  (invalidate)

    C->>API: GET /counter/item_42
    API->>RC: GET cached_count:item_42
    RC-->>API: miss
    API->>DB: SELECT SUM(n) WHERE item_id=42
    DB-->>API: 2301456
    API->>RC: SET cached_count:item_42 = 2301456 EX 10
    API-->>C: { "count": 2301456 }

Failure modes

Hot key in Redis

Every increment for a viral item hits the same Redis key (counter:buf:{item_id}). Redis is single-threaded per key. A single key can handle roughly 100k–200k INCR operations/sec on modern hardware before becoming a bottleneck.

Apply the same sharding idea to the buffer layer: maintain N buffer keys per item (counter:buf:{item_id}:{shard}), increment a random one, and SUM them during flush. For items below 100k increments/sec, a single key is fine.

Buffer node crash — lost increments

If a Redis node crashes before the flusher runs, in-memory increments are lost. The magnitude: up to 1 second of writes, for items receiving the flush-interval's worth of events. There are three ways to handle this, depending on how much you care:

Accept the loss (view counts): 1s of lost views on a viral item is imperceptible. This is the correct trade-off for view counts.
Redis AOF persistence (fsync every second): survives most crashes; increases write latency slightly.
Kafka as a durable event log: publish every view event to Kafka before buffering in Redis. Flusher reads from Kafka with committed offsets — guaranteed at-least-once delivery. More complex but eliminates the loss window.

Double counting

At-least-once delivery (Kafka retries, HTTP retries from clients) can send the same increment event multiple times. For views, double counting is tolerable — a viewer hitting play twice should probably count twice anyway. The count may be slightly inflated; this is industry-accepted. For likes, the idempotent edge store (ON CONFLICT DO NOTHING) makes duplicates harmless by construction. The like count is always the cardinality of unique (user, item) pairs. For unique viewers, HyperLogLog PFADD is also idempotent — adding the same element twice has no effect on the estimate.

Monotonicity — count going backward

A count that decreases is a trust-destroying bug. Two scenarios where this can happen:

Shard migration gone wrong: if you split a shard and the new routing kicks in before the data is fully replicated, some reads will SUM a subset of shards and return a lower number. Fix: during migration, serve reads from the old schema until migration is complete, then atomically cut over.
Cache serving stale data after a count reset: if an admin resets a counter for abuse remediation and the cache is still warm, reads return the old (higher) count. Fix: cache invalidation must happen as part of the admin action, not asynchronously.

Read-after-write inconsistency

A user likes a post, then immediately views their profile and sees "0 likes." This violates user expectations even if eventual consistency is acceptable for anonymous viewers.

Write-behind the like count to a sticky read-your-own-writes (RYOW) cache key per user session, or route the user's next read to the DB (or a synchronous replica) for a short window after a mutation.

Storage choices

Data	Store	Why
View increment buffer	Redis (INCR)	Sub-ms atomic increments; millions of ops/sec
Durable view counters	Cassandra counter columns	Wide-column; COUNTER type handles concurrent increments; natural partition by item_id
Like edges	Postgres or DynamoDB	ACID for uniqueness enforcement; composite PK prevents duplicate likes
Like count cache	Redis	Fast cached integer; invalidated on change
HyperLogLog (unique viewers)	Redis PFADD/PFCOUNT	Native HLL support; 12 KB per item regardless of cardinality
Analytics / time-series	ClickHouse or BigQuery (offline)	Historical view/like trends; not on hot path

Things to discuss in an interview

The single UPDATE counter = counter + 1 fails under load because the row-level lock serializes all concurrent writers — throughput is bounded by lock-release rate, not hardware capacity.

Sharded counters and write batching are complementary, not alternatives. Sharding eliminates the hot row; batching eliminates per-event DB writes. Candidates who pick one and ignore the other miss half the solution.

For likes, the right frame is that incrementing a raw counter is not idempotent, but storing an edge is. The like count becomes a derived property (cardinality of unique pairs), not a mutable field — which is why at-most-once correctness falls out naturally.

On the durability vs throughput dial: Redis buffer with AOF is a middle ground; Kafka is the durable extreme; pure in-memory is the fast-but-lossy extreme. For view counts, in-memory is right. For ad impression billing, Kafka is right.

For unique-viewer counting, the probabilistic trade-off is HyperLogLog's 0.81% error at up to 12 KB (Redis's implementation, 16,384 registers) vs exact counting at O(N) storage. HLLs can also be merged across time windows — daily unique viewers equals the merge of 24 hourly HLLs.

Things you should now be able to answer

Why does a single UPDATE row serialize writes, and at what QPS does it become a problem?
How do sharded counters preserve write throughput — and what does the read path look like?
Why are likes modeled as edges rather than raw increments?
What does "eventual consistency" mean for a view count — and what is an acceptable staleness window?
When would you choose HyperLogLog over an exact set for counting, and what is the error guarantee?
What happens to buffered view counts if the Redis node crashes mid-second?

Frequently asked questions

▸What is the hot-row problem in distributed counters?

A single UPDATE counter = counter + 1 WHERE id = ? forces every concurrent increment to acquire the same row-level lock, serializing all writes. Under viral load — for example a single item absorbing 100k+ increments per second — writes queue behind that lock, latency climbs to seconds, and the database melts. Sharded counters and write batching both exist specifically to eliminate this bottleneck.

▸When should you model likes as edges instead of raw increments?

Use an idempotent edge store — a (user_id, item_id) composite primary key with INSERT ON CONFLICT DO NOTHING — any time at-most-once semantics per user matter and the count must not go backward. Raw incrementing, even with batching, cannot prevent double-counting across retries or network replays, whereas the edge table makes the like count the cardinality of a unique set, which is naturally exact.

▸What error rate does Redis HyperLogLog give for unique-viewer counting, and how much memory does it use?

Redis's HyperLogLog implementation, which uses 2^14 = 16,384 registers, provides a standard error of about 0.81% and requires at most 12 KB of memory per counter regardless of set size. It uses a sparse representation at low cardinalities and transitions to the fixed 12 KB dense structure as cardinality grows, making it practical even for items with billions of viewers.

▸How much does write batching reduce database writes for a viral item?

Buffering increments in Redis with INCR and flushing aggregated deltas to Cassandra every second collapses a high-write item to at most one DB write per shard per second. A video receiving 10,000 views in one second sends exactly one DB write per flush interval instead of 10,000 — a 10,000× reduction for that item. Platform-wide, the 2M raw increments per second collapse to at most 500k flushes per second — a 4× reduction overall, with individual hot items seeing 100 to 1,000× or more depending on their write rate.

▸What is the durability risk of the Redis write buffer, and how do you mitigate it?

If the Redis node crashes before the flusher runs, up to one second of buffered increments are lost. For view counts this loss is industry-acceptable and the simplest choice is to accept it. Stronger options are enabling Redis AOF persistence with fsync every second, or publishing every event to Kafka as a durable log before buffering, which eliminates the loss window at the cost of added complexity.

← previous

Design a Distributed Message Queue (Kafka)

Design a Calendar System (Google Calendar)

// RELATED

Design a Distributed Counter (view / like counts)

The problem

Functional requirements

Non-functional requirements

Capacity estimation

Building up to the design

V1: One row, one increment

V2: Sharded (striped) counters

V3: Write batching with a buffer tier

V4: Exact likes — idempotent edge storage

V5: Unique-view cardinality — HyperLogLog

High-level architecture

Write path — views

Write path — likes

Read path

Sharded counter deep-dive

Exact vs approximate — the trade-off table

Sequence diagram: view increment + flush

Failure modes

Hot key in Redis

Buffer node crash — lost increments

Double counting

Monotonicity — count going backward

Read-after-write inconsistency

Storage choices

Things to discuss in an interview

Things you should now be able to answer

Further reading

Frequently asked questions

You may also like

Design an LLM Observability Platform

Design an LLM Gateway (AI Gateway & Model Router)

Design an LLM Fine-Tuning Platform