Design a Flash Sale / Seckill System
Sell limited stock to a massive, spiky crowd without overselling. Atomic inventory decrement, request shedding, queues, and graceful degradation.
The problem
Alibaba's Singles' Day sells $84 billion in goods in 24 hours, with peak transaction rates exceeding 580,000 per second. A PlayStation 5 console drop or a limited Air Jordan release hits a different but structurally identical spike: a fixed number of units — say 10,000 — go on sale at a precise moment, and hundreds of thousands of users click "buy" within the same second. The inventory counter for that one SKU goes from a sleepy number to the hottest byte in your entire infrastructure in under a millisecond.
What makes this hard is an inversion of every assumption that normally makes distributed systems scalable. Sharding, replication, caching, fan-out — those techniques all assume load spreads across many keys. In a flash sale, load converges onto exactly one: the counter for one SKU. Your usual scaling tools work against you. Add replicas and you get stale reads; shard the counter and you risk phantom sold-out states; cache the count and you introduce a consistency gap where you can oversell. You have to simultaneously guarantee correctness (sold count never exceeds stock), survivability (the spike must not collapse the rest of the platform), and settlement accuracy (every successful purchase results in exactly one charge and one fulfilled order).
A flash sale sells fungible units — "one of 10,000 identical items." That is fundamentally different from Ticketmaster, where each seat is a distinct database row and row-level locks distribute the contention naturally. Here you have one counter, not 10,000 rows, and a million requestors all want to decrement it at once.
The core engineering tension is two-sided: atomicity vs throughput at the reservation step, and speed vs durability at the order-creation step. Solving the first without breaking the second is the design.
Functional requirements
- Display a limited-stock product page; show approximate remaining stock.
- When the sale opens, allow authenticated users to attempt purchase.
- Atomically decrement inventory; reject once stock is exhausted (sold count ≤ stock, always).
- Hold reserved stock for a TTL window while the user checks out; release if unpaid.
- Create a confirmed order and process payment exactly once per reservation.
- Show honest state to users: "in your cart for 4:58", "sold out", "you're in queue".
Non-functional requirements
- Correctness: never oversell. Selling 10,001 of 10,000 is the primary failure mode. It triggers refunds, customer complaints, and potentially legal exposure.
- No double-charge. Client retries, page refreshes, and network timeouts are guaranteed at this load; idempotency is mandatory.
- Survive 1M QPS for ~10 seconds. The spike is real; the system must not cascade-fail the rest of the platform.
- Fast rejection. A request that cannot possibly succeed (stock already zero, or user already bought) must fail in milliseconds, at the edge, before consuming expensive resources.
- Graceful degradation. If internal components struggle, the user experience degrades to "sold out" — not to HTTP 500s and blank pages.
Why the naive design fails
-- Two threads execute this concurrently for the last unit:
SELECT stock FROM items WHERE id = 42; -- both read stock = 1
-- both see stock > 0, both proceed
UPDATE items SET stock = stock - 1 WHERE id = 42;
-- stock = -1 — oversold
The classic read-modify-write race. Two standard SQL fixes exist; both fail at flash-sale scale:
Fix A: Conditional UPDATE
UPDATE items SET stock = stock - 1
WHERE id = 42 AND stock > 0;
-- rows_affected = 1 → success; = 0 → sold out
Correct — atomic check-and-decrement. But at 1M concurrent QPS, every one of those requests contends on the same row lock in the DB. They serialize. The DB connection pool exhausts. p99 latency goes to seconds. The database node falls over under lock wait pressure. Correct logic, wrong physical location.
Fix B: SELECT … FOR UPDATE Same problem with an explicit lock. Now you have a queue of 1M lock waiters. Even correct behavior produces timeouts at this concurrency.
The row is the wrong place for this counter. The fix is to move the contended counter to a store built for atomic, single-threaded, in-memory operations.
Capacity estimation
| Dimension | Estimate | How we got there |
|---|---|---|
| Stock (representative sale) | 10,000 units | Given baseline |
| Peak buy-attempt QPS | ~1,000,000 writes/sec on one key | 1M users click in ~1 s |
| Success rate | 1% (10,000 of 1,000,000) | 99% rejected |
| Product page reads (peak) | ~17,000 reads/sec | 1M users × 10 page loads ÷ 600 s |
| Confirmed order write rate | ~17 orders/sec | 10,000 orders ÷ 600 s |
| Redis Lua throughput (one node) | ~100,000+ ops/sec | Single-threaded atomic command; one decrement = one op |
Takeaway: The DB write load (17 orders/sec) is trivial — the entire challenge is the admission spike of 1M QPS on a single key. Product pages must be CDN-served; the waiting room must reduce what actually reaches Redis to a survivable rate well below the 100k+/sec ceiling.
Building up to the design
Start where everyone starts: one SQL row
UPDATE items SET stock = stock - 1
WHERE id = 42 AND stock > 0
RETURNING stock;
This is actually correct. The WHERE stock > 0 check and the decrement happen atomically inside the database's row-lock cycle — you cannot sell more than you have. At small scale (a few hundred concurrent users) it works fine and takes about five minutes to write.
The failure is physical, not logical. At tens of thousands of concurrent requests all hammering the same row, every request waits in a queue to acquire that row lock. The DB connection pool fills up, connection attempts start timing out, and the database begins to drop requests entirely. Even correct SQL logic can't save you when the physical contention overwhelms the hardware. This is an important distinction to land in an interview: the conditional UPDATE is logically sound; it fails because the wrong component is absorbing the contention.
Move the counter to Redis
Once you understand that the problem is contention on a shared mutable counter, the next step is obvious: put it in a store designed for exactly this. Redis is single-threaded for command execution, holds everything in memory, and can process on the order of 100,000 operations per second for simple commands.
Pre-load the inventory count before the sale opens:
SET stock:42 10000
On each purchase attempt, run an atomic Lua script (detailed below). Redis executes Lua scripts without interleaving other commands — it is, for the duration of the script, doing exactly one thing. No two requests can race on the decrement.
This gets you atomic check-and-decrement at 100k+ ops/sec, instant rejection the moment stock hits zero, and the database is no longer in the contention path.
But a decrement is not a sale. If the user abandons checkout, that unit is gone permanently — you "sell out" while units sit unpaid. And the flood still hits your API layer: a million simultaneous connections will saturate your load balancers and connection pools well before Redis is even reached.
Add reservations with TTL
A successful decrement creates a reservation key with a TTL (say, 5 minutes):
SET reservation:{user_id}:{sku_id} 1 EX 300
If the user pays, you convert the hold to a confirmed order in the durable DB and delete the hold key. If the TTL expires before payment, a sweeper detects it and INCRs the counter, returning the unit to the pool.
Now the invariant holds at all times: available + held + sold = total. Abandoned carts don't leak stock.
The problem that remains is the flood itself. Even rejecting a request — returning a 429 — costs a network connection, an API server thread, and CPU. A million simultaneous connections will saturate your infrastructure before Redis is reached.
Put a waiting room in front
A virtual waiting room absorbs the herd before it reaches anything expensive. Users click "buy" and receive a signed queue position. The waiting room admits them into the actual purchase path at a controlled rate. Everyone else waits, cheaply, from a lightweight service or CDN rule.
The core system — Redis, your API cluster, the DB — only ever sees survivable load. The waiting room's job is simple enough that it can be scaled independently: its only state is a Redis counter.
Decouple reservation from order creation
The last piece: calling a payment processor, writing a multi-table order record, and sending a confirmation email are all slow operations. Blocking the reservation path on any of them adds latency and failure surface area.
Decouple them with a queue. The reservation service does the fast work (check-and-decrement, write hold key, enqueue intent) in under 5ms and tells the user "you're in." A worker pool drains the queue, handles payment, and writes the durable order record asynchronously. If Redis is lost, rebuild the counter from confirmed orders in the DB.
This is the production design.
flowchart LR
V1["V1: SQL conditional UPDATE<br/>correct, melts at scale"] --> V2["V2: + Redis atomic DECR<br/>100k+ ops/sec"]
V2 --> V3["V3: + reservation TTL<br/>abandoned stock returns"]
V3 --> V4["V4: + waiting room<br/>shed the herd"]
V4 --> V5["V5: + async queue<br/>+ durable reconciliation"]
style V1 fill:#0e7490,color:#fff
style V2 fill:#15803d,color:#fff
style V4 fill:#ff6b1a,color:#0a0a0f
style V5 fill:#a855f7,color:#fff
Architecture
flowchart TD
U["Users"] --> CDN["CDN<br/>static product page<br/>+ sold-out state"]
U --> WR["Waiting Room Service<br/>issue signed admission tokens"]
WR -->|"admit at rate R"| GW["API Gateway<br/>+ per-user rate limit"]
GW --> RS["Reservation Service"]
RS -->|"Lua: check + DECR + set hold"| REDIS[("Redis<br/>stock counter<br/>+ hold keys")]
REDIS -->|"reserved"| RS
RS -->|"enqueue purchase intent"| MQ["Message Queue<br/>(Kafka / SQS)"]
RS -.->|"stock = 0"| REJ["sold-out response"]
MQ --> WRK["Order Worker pool"]
WRK --> PAY["Payment Processor"]
WRK --> ORD[("Orders DB<br/>durable source of truth")]
SWEEP["Hold-expiry sweeper"] -.->|"INCR expired holds"| REDIS
RECON["Reconciler"] -.->|"counter vs confirmed orders"| ORD
style REDIS fill:#ff6b1a,color:#0a0a0f
style WR fill:#a855f7,color:#fff
style ORD fill:#15803d,color:#fff
style MQ fill:#0e7490,color:#fff
style SWEEP fill:#ffaa00,color:#0a0a0f
The atomic decrement, precisely
A bare DECR goes negative — it'll happily return −5. You need check-and-decrement as one atomic unit. Redis executes Lua scripts atomically (single-threaded; no other command or script interleaves while a script is in flight).
One production caveat worth knowing: if Redis is running near its maxmemory limit, the first write command inside a script that would consume additional memory causes the script to abort — the script does not complete, and the reservation is silently dropped. Because of Redis's atomicity guarantee, other clients never observe a half-executed state; the risk is a dropped request, not data corruption. Keep the inventory Redis instance well below its memory limits and monitor headroom. Redis 7.0+ introduced Redis Functions as a cleaner production replacement for EVAL-based Lua scripts; the atomicity semantics are identical, but Functions persist across restarts (stored in RDB/AOF and replicated to replicas) and are easier to manage in production.
-- KEYS[1] = stock key e.g. "stock:sku:42"
-- KEYS[2] = hold key e.g. "hold:sku:42:user:99"
-- ARGV[1] = TTL in seconds (e.g. 300 for 5 minutes)
local stock = tonumber(redis.call('GET', KEYS[1]))
if not stock or stock <= 0 then
return -1 -- sold out; caller returns fast rejection
end
redis.call('DECR', KEYS[1])
redis.call('SET', KEYS[2], 1, 'EX', ARGV[1])
return stock - 1 -- remaining stock after this decrement
Notice what this script does not do: it does not acquire a lock, it does not queue waiting threads, and a rejected request returns immediately rather than blocking. The script is O(1) regardless of how many requests are competing. Every request either decrements and gets a hold, or reads zero and exits. There is no middle state.
This is the heart of the design. Everything else is scaffolding to protect it and make the outcome durable.
Scaling a single hot key beyond one Redis node
One Redis node running Lua check-and-decrements can process on the order of 100,000 such operations per second (the exact figure depends on instance size and script complexity; treat this as an order-of-magnitude estimate). For a sale where 10K units sell in ~10 seconds, admission control (the waiting room) ensures the actual decrement rate stays well below that ceiling.
If you need to go higher — or if you want Redis HA without a single point of failure — partition the counter:
stock:42:shard:0 → 1,000 units
stock:42:shard:1 → 1,000 units
...
stock:42:shard:9 → 1,000 units
Each request hashes to a shard (e.g., hash(user_id) mod 10) and decrements that shard's counter. The peak write load spreads across 10 independent keys (and potentially 10 nodes). The cost: a request can be rejected by an empty shard even if another shard still has stock — a "phantom sold out" for a few units. For a fungible-goods flash sale, selling 9,992 of 10,000 and showing "sold out" is almost always acceptable. The same shard-the-hot-key pattern appears in rate limiters and distributed counters throughout the industry.
Pre-loading inventory into Redis
The Redis counter must be populated before the sale opens — not lazily on the first request. The pre-load step:
# During sale setup (minutes before the start time):
SET stock:42 10000
# Optionally set an expiry longer than the sale window so the key auto-cleans:
EXPIREAT stock:42 <unix_timestamp_of_sale_end_plus_buffer>
Why does this matter? If the counter isn't in Redis when the first request hits, a cache miss falls back to the DB — and that miss happens for every request in the first millisecond, producing a thundering herd on the DB at exactly the wrong moment (the sale's opening second).
The pre-load job also runs a sanity check: confirm that redis_stock == db_stock before the sale opens. Any discrepancy from a previous sale's reconciliation still running is resolved before the gate opens.
Admission control: the virtual waiting room
The waiting room's job is simple: ensure the purchase core never sees more concurrent users than it can handle. Here's how it works in practice.
When a user navigates to the sale page (or clicks "buy"), they receive a signed queue token — a JWT with position, issue timestamp, and expiry. A controller tracks the admitted_cursor, the queue position up to which users have been let through, and advances it at a fixed rate (say, 2,000 admissions per second). Once a user's position falls at or below the cursor, they get a short-lived admission token (another JWT, ~5 minute TTL) that the reservation API requires. Any call without a valid admission token gets a clean HTTP 429.
sequenceDiagram
participant U as User browser
participant WR as Waiting Room
participant RS as Reservation Service
participant RE as Redis
U->>WR: GET /sale/42 (at T=0)
WR->>U: "You are position 4,000 — est. wait 2 s"
loop every 3 seconds
U->>WR: GET /queue/status
WR->>U: "Position 4,000 — now serving 2,000"
end
WR->>U: "Your turn — admission token (5 min TTL)"
U->>RS: POST /purchase { sku: 42, idempotency_key: "abc-123" }
RS->>RE: EVAL lua_check_and_decr KEYS stock:42 hold:42:user:7
RE-->>RS: remaining = 6,500
RS->>U: "Reserved — 5:00 to complete payment"
The waiting room itself stores only a Redis counter (queue:42:position and queue:42:admitted). It is lightweight and horizontally scalable. If it restarts, users rejoin the queue — acceptable, because the alternative (durable queue state across restarts) is substantially more complex without meaningfully improving user experience.
The waiting room is also where bot and abuse defenses live: a CAPTCHA or proof-of-work challenge before a queue token is issued, device fingerprinting, and per-account purchase caps. Pure throughput limiting is not enough — determined scalper bots arrive first and stay within per-IP rate limits. Defenses here are a product and trust-and-safety problem as much as an engineering one.
Reservation lifecycle and hold sweeper
A reservation is a promise of stock, not a completed sale. The hold key has a TTL:
stateDiagram-v2
[*] --> Available : sale initialized
Available --> Reserved : atomic Lua DECR + hold key set
Reserved --> Available : TTL expires (sweeper INCR) OR user cancels
Reserved --> Purchased : payment confirmed → order written to DB
Purchased --> [*]
The hold sweeper is a background process that periodically scans an auxiliary sorted set of hold keys (scored by expiry timestamp) — this is the reliable production approach. Redis keyspace notifications for expired keys are a tempting alternative but have two critical caveats: expired events are emitted when Redis actually deletes the key (via lazy deletion on access or background sampling), which can lag the TTL deadline by a significant and unbounded amount; and keyspace notifications use Pub/Sub, which is fire-and-forget — a disconnected subscriber loses all events with no replay. Don't rely on keyspace notifications alone for returning stock.
For each confirmed-expired hold, the sweeper calls INCR stock:{sku_id} to return the unit. The counter never goes above the initial stock because the sweeper only runs INCR for confirmed-expired holds.
The invariant you want to state in any interview, and verify in code:
available(t) + held(t) + sold(t) = total ∀ t
If this ever fails, you have either oversold (sold > total − available − held) or leaked stock (stock returned more than once for one hold). Periodic reconciliation against the orders DB catches both.
Decoupling reservation from order creation
The fast path must not be blocked by slow operations: calling a payment processor, writing a multi-table order record, sending a confirmation email. The design decouples them with a queue.
Fast path (< 5ms, synchronous):
- Validate admission token.
- Run Lua check-and-decrement.
- Write hold key with TTL.
- Enqueue purchase intent to Kafka/SQS:
{ user_id, sku_id, hold_key, idempotency_key, timestamp }. - Return "reserved" to the user.
Slow path (seconds, async):
- Worker dequeues purchase intent.
- Validate hold still active (hold key still exists in Redis).
- Call payment processor with idempotency key.
- On payment success: write order record to DB, delete hold key, emit
OrderConfirmedevent. - On payment failure: call
INCR stock:{sku_id}to return unit, delete hold key, notify user.
The user sees "you're in" within milliseconds. The actual money movement happens behind the scenes, and the user is shown a "processing payment" state while the worker runs.
sequenceDiagram
participant U as User
participant RS as Reservation Service
participant RE as Redis
participant MQ as Kafka / SQS
participant WK as Order Worker
participant PP as Payment Processor
participant DB as Orders DB
U->>RS: POST /purchase (with admission token + idempotency key)
RS->>RE: EVAL check-and-decrement Lua
RE-->>RS: reserved (remaining = 6,500)
RS->>MQ: enqueue { user_id, sku_id, idempotency_key }
RS->>U: "Reserved — 5:00 to pay" (< 5ms)
MQ->>WK: deliver purchase intent
WK->>RE: check hold still active
WK->>PP: charge card (idempotency_key)
PP-->>WK: success
WK->>DB: INSERT order record
WK->>RE: DEL hold key
WK->>U: "Order confirmed" (push / email)
Idempotency: never double-charge
Under 1M QPS and mobile clients, retries are guaranteed. The payment must be idempotent.
Each purchase attempt carries a client-generated idempotency key (a UUID). The payment service stores (idempotency_key → result) with a 24-hour TTL. If a retry arrives with the same key, it returns the original result without re-charging. This is the same mechanism described in Design a Payment System; for flash sales it is non-negotiable because the retry storm is predictable and large.
The idempotency key also protects the reservation step: a user who double-taps "buy" gets back their existing reservation rather than creating a second one and consuming two units.
Caching the product page
The product page itself — descriptions, images, pricing — is entirely static during the sale. Serve it from CDN with aggressive caching:
Cache-Control: public, max-age=300, stale-while-revalidate=60
The "remaining stock" display is the only dynamic element. Do not put the live counter on the CDN-cached page. Instead, show a cached approximate count (updated every 1–5 seconds) via a separate lightweight endpoint served from an in-memory cache — not the Redis hot counter. Once stock hits zero, flip the page to "sold out" and cache that state aggressively on the CDN. The sold-out page should serve from CDN edges without ever reaching your origin.
This way, millions of browsers watching the "remaining: 4,821" counter never touch the real inventory system. Only the ~N actual buyers per second reach the reservation path.
Failure modes
| Failure | Symptom | Mitigation |
|---|---|---|
| SQL oversell race | Two threads both read stock=1 and both decrement | Correct conditional SQL works; fails under 1M QPS due to lock contention — use Redis Lua |
Redis DECR goes negative | Stock counter at −3 | Lua script: check before decrement; reject if ≤ 0 |
| Redis node failure mid-sale | Counter lost; can't accept new reservations | Run Redis with a primary + replica (or Redis Sentinel / Cluster); on failover, rebuild counter from total − confirmed_orders in the DB |
| Reservation sweeper lag | Expired holds not returned; phantom sold-out | Sweeper catches up; brief undersell is acceptable; never oversells |
| Payment-after-reserve abandonment | User reserves but never pays; unit locked for TTL | TTL on hold key; sweeper returns units; tune TTL to cart completion rate |
| Idempotency key collision | Two distinct users generate same UUID | Use a namespaced key: {user_id}:{UUID} — probability of collision is negligible |
| Worker dead during payment | Charge attempt lost; hold expires | Kafka consumer restart replays message; idempotency key prevents re-charge |
| DB write fails after payment success | User charged, no order record | Reconciler compares payment processor records vs DB orders; creates missing order records; see payment saga |
| Bot stampede | 99% of queue positions taken by bots | CAPTCHA / proof-of-work at waiting-room entry; per-account purchase caps; device fingerprinting |
| CDN cache stale "sold out" | Page shows sold-out after restock (unlikely but possible) | Short TTL on sold-out state (30–60s), or purge CDN on restock event |
The overselling vs underselling trade-off
Bucketed counters and expiry timing can leave a handful of units unsold: a shard hits zero while another still has stock, or a sweeper races with a decrement. For fungible goods, a few unsold units is categorically better than a few oversold units. Overselling means you've promised goods you can't deliver — refunds, complaints, legal exposure. Underselling means you leave a small amount of revenue on the table. Make this trade-off explicit when you discuss it. If every last unit must sell, add a post-sale single-pass sweep across all buckets.
This is the fundamental difference from Design Ticketmaster: for assigned seating, even a single double-booking is unacceptable (two people, one seat). For a flash sale with fungible units, a small undersell is an acceptable engineering trade-off for substantially higher throughput.
Storage choices
| Data | Store | Rationale |
|---|---|---|
| Live inventory counter | Redis (in-memory, Lua-atomic) | 100k+ ops/sec; single-threaded atomicity; instant rejection |
| Active hold keys | Redis (with TTL) | Same store; TTL-driven expiry is native |
| Waiting room queue position | Redis counters | INCR/GET only; ephemeral is fine |
| Confirmed orders | Postgres / DynamoDB | ACID, durable, the real source of truth |
| Purchase intent queue | Kafka or SQS | Durable, replayable; decouples reservation from payment |
| Product catalog / page | CDN / object storage | Read-once-cache-forever; never hits origin at sale time |
| Analytics (click stream) | Kafka → S3 / ClickHouse | Fire-and-forget; does not block any purchase path |
Things to discuss in an interview
- The single hot key: identify it immediately — the SKU's stock count. Explain why it's different from typical scaling problems where you can shard by key.
- Why SQL fails here, even when correct: the conditional UPDATE is logically correct; the failure is physical (lock contention, connection pool exhaustion). Distinguish logic from physical capacity.
- Pre-loading Redis: why it matters, how you validate it, what happens on a cold start during a sale.
- Waiting room as a load-shedding product feature: not just a technical guard, but a UX feature that turns "503 error" into "you're in queue position 12,000." Fairness as a design goal.
- The
available + held + sold = totalinvariant: state it, describe how you check it, describe how you recover if it's violated. - Oversell vs undersell: for fungible goods, undersell is the right trade-off. For specific-seat inventory (Ticketmaster), it's not.
- Reconciliation: what happens when Redis is lost? Rebuild from the DB. How often do you reconcile? What discrepancies do you look for?
- Async order creation: why decouple? What are the failure modes at the queue boundary? How does the user know their order succeeded?
Things you should now be able to answer
- Why does a correct SQL conditional UPDATE still fail at 1M QPS?
- How does a Redis Lua script prevent overselling without a lock queue?
- What is the invariant that must hold at all times, and how do you verify it?
- How does the virtual waiting room protect the system, and what does it cost?
- Why is a reservation different from a sale, and how does stock return when a reservation expires?
- Where is the durable source of truth, and how do you recover the Redis counter if it's lost?
- What's the difference between the flash-sale design and the Ticketmaster design, and why does it matter?
Further reading
- Alibaba / Taobao engineering writeups on the "seckill" (秒杀) system — widely referenced in distributed systems discussions; covers pre-filtering, bucket-based inventory, and the multi-layer architecture used at extreme scale.
- Redis documentation: Lua scripting, atomic operations, keyspace notifications — redis.io.
- "Designing Data-Intensive Applications" (Kleppmann) — Chapter 7 on transactions and the semantics of atomic operations.
- Design a Rate Limiter — the same shard-the-hot-counter technique applied to per-user request quotas.
- Design a Payment System — idempotency keys, payment sagas, and reconciliation workers in depth.
- Design Ticketmaster — the assigned-seating cousin: row-level locks instead of a single counter; strong consistency as the non-negotiable.
Frequently asked questions
▸Why does a correct SQL conditional UPDATE fail at flash-sale scale?
The conditional UPDATE is logically sound — the WHERE stock > 0 check and decrement are atomic inside the row-lock cycle and cannot oversell. The failure is physical: at 1M concurrent QPS all contending on the same row, every request must queue for that row lock, the DB connection pool exhausts, and the database node falls over under lock-wait pressure. The solution is to move the counter to Redis, where a single-threaded Lua script performs the same check-and-decrement at 100,000+ ops/sec without a lock queue.
▸How does a Redis Lua script prevent overselling?
Redis executes Lua scripts atomically: no other command or script interleaves while a script is in flight. The script reads the current stock count, returns -1 immediately if it is zero or less, and only then issues DECR and sets a hold key with a TTL. Because the check and the decrement are one indivisible operation, two concurrent requests can never both read a positive count and both decrement — one will always see the already-decremented value.
▸What is the inventory invariant for a flash sale, and what breaks it?
The invariant is: available(t) + held(t) + sold(t) = total, at all times. It breaks in two ways: overselling, where sold exceeds total minus available minus held, typically from a race condition; and stock leakage, where a hold is returned more than once by the sweeper. Periodic reconciliation of the Redis counter against confirmed orders in the durable database is the recovery mechanism.
▸When should you use counter sharding for flash-sale inventory, and what does it cost?
Shard the Redis counter when a single node's ~100,000 ops/sec ceiling is insufficient, or when you need Redis high availability without a single point of failure. Partitioning 10,000 units across 10 shards spreads write load across 10 independent keys. The cost is phantom sold-outs: a request may be rejected by an empty shard even if another shard still holds stock, potentially leaving a few units unsold. For fungible goods this undersell is an acceptable trade-off; it is not acceptable for assigned-seat inventory.
▸Why should you not rely on Redis keyspace notifications to return expired reservation stock?
Keyspace notifications are emitted when Redis actually deletes the key via lazy deletion or background sampling, which can lag the TTL deadline by an unbounded amount. They also use Pub/Sub, which is fire-and-forget — a disconnected subscriber loses all events with no replay. The reliable production approach is a background sweeper that scans an auxiliary sorted set of hold keys scored by expiry timestamp and calls INCR on confirmed-expired holds.
You may also like
Design an LLM Observability Platform
Build the distributed tracing backbone for non-deterministic, multi-step LLM applications — capturing every prompt, completion, token count, and dollar cost across chains, retrievals, and tool calls so you can debug a failed agent run and account for every cent.
Design an LLM Gateway (AI Gateway & Model Router)
A single proxy control plane in front of OpenAI, Anthropic, Google, and open models — routing ~65 trillion tokens a month with automatic failover, semantic caching, per-team budget enforcement, and streaming SSE passthrough, all under 50 ms of added latency.
Design an LLM Fine-Tuning Platform
Turn a base model and a dataset into a deployed fine-tuned adapter at scale — the end-to-end platform covering dataset ingestion, LoRA/QLoRA/DPO training, fault-tolerant distributed GPU scheduling, eval gating, and multi-LoRA serving for hundreds of concurrent fine-tunes.