~/articles/design-shopping-cart-checkout

◆◆◆Advancedasked at Amazonasked at Shopifyasked at eBay

Design a Shopping Cart & Checkout System

Keep a cart consistent across devices, then check out without overselling or double-charging. The available-cart vs consistent-checkout split, inventory holds, and the order saga.

20 min read2026-06-12Ironclad Academy

#interview #e-commerce #consistency #inventory

// DEPTH

the full breakdown — requirements, capacity, evolution, trade-offs

The problem

Amazon's cart holds billions of active line items at any moment. Shopify powers millions of storefronts, each running its own cart-and-checkout pipeline. At their core, both do the same thing: let a shopper collect items over time, then exchange money for those items in a single atomic step. The cart half is casual and forgiving. The checkout half is unforgiving and exact.

The cart looks simple — it is just a list of SKUs and quantities. But the moment you add multi-device sync (phone adds a shoe, laptop removes it, tablet reads the result) and anonymous-to-logged-in transitions (guest session merges into a real account on login), a plain SQL row is no longer the right tool. The cart must stay available even during partial outages; a shopper who can't add an item is a lost sale.

Checkout is the opposite problem. Once a shopper hits "Place Order," two shoppers cannot both buy the last unit, and a payment that succeeds cannot silently produce a failed order. The system must span three separate services — inventory, order creation, and a payment processor — and if any step fails, it must undo the preceding ones cleanly. This is where exactly-once semantics, distributed sagas, and idempotency keys matter.

The core engineering tension is that these two halves demand fundamentally different storage strategies. The cart wants high availability and eventual consistency; checkout wants strong consistency and atomic multi-step commits across services that don't share a database. Getting both right in one user flow, and getting the handoff between them right, is what makes this a compelling interview problem.

Functional requirements

PUT /cart/items — add or update a line item (user or guest session).
GET /cart — fetch current cart; works on any device, any session.
DELETE /cart/items/{sku} — remove a line item.
On login: merge guest cart into user cart.
POST /orders — place an order: validate prices + promotions, reserve inventory, charge payment, create order record.
Order history and status via GET /orders.

Non-functional requirements

Cart availability over consistency — a user must be able to add to cart even when downstream services are degraded. Stale cart data is acceptable; lost carts are not.
Checkout must be strongly consistent — no overselling, no double-charges, no "payment succeeded but order failed" ghosts.
Idempotency — "place order" must be safe to retry (network drops, double-clicks).
Low checkout latency — p99 under 3 seconds including payment authorization.
Abandoned cart handling — inventory held at checkout must be released if the user doesn't complete payment.
Scale — 50M DAU, read/write ratio on carts ~10:1, checkout rate ~1–2% of cart events.

Capacity estimation

Dimension	Estimate	How we got there
DAU	50M users	baseline assumption
Add-to-cart rate (avg)	~2,800/sec	`50M × 1 / (5 × 3,600)` — one add-to-cart per user per 5 hours
Add-to-cart rate (peak)	~10,000/sec	2–4× evening spike
Checkout rate (peak)	~150/sec	`10,000 × 1.5%` — ~1.5% of add-to-cart events complete as orders
Cart size	800 B per cart	avg 4 line items × 200 B each
Active cart data	8 GB raw · ~24 GB with replication	`10M carts × 800 B = 8 GB`; 20% of DAU have a live cart; 3× replication
Inventory reads	~600/sec	`150 checkouts/sec × 4 items` — negligible for sharded Postgres; cache aggressively
Inventory writes (reservations)	~600/sec typical · 10,000+/sec flash sale	same as inventory reads; hot SKUs spike dramatically — see failure modes
Orders write throughput	~300 KB/sec	`150 orders/sec × 2 KB` — trivial
Orders volume	~13M/day · ~5B/year	`150 × 86,400 ≈ 13M/day` — shard by `user_id` or `order_id`

Takeaway: Cart storage (8 GB raw, 24 GB replicated) fits comfortably in a mid-size Redis cluster; checkout throughput is modest at ~150/sec — the dominant scaling pressure is hot-SKU lock contention during flash sales, not raw write volume.

Building up to the design

The interesting thing about this problem is that it's really two problems wearing the same clothes. The cart and the checkout look like a single flow to the user, but they need completely different storage strategies. Walking the evolution makes that split obvious.

V1: One database, one cart table

CREATE TABLE cart_items (
  user_id BIGINT,
  sku     VARCHAR(64),
  qty     INT,
  PRIMARY KEY (user_id, sku)
);

On POST /orders, read cart, check inventory in code, debit inventory, insert order, clear cart — all in one transaction. This is correct, atomic, and handles thousands of users without breaking a sweat.

The problem shows up when you need the cart to be available during partial outages. Postgres replication lags, or a primary failover happens, and suddenly your "simple list of items" is unavailable to users who just want to browse and add things. There's also no clean story for multi-device sync or conflict resolution built into SQL transactions.

V2: Move the cart to Redis

Cart items go into a Redis hash: HSET cart:{user_id} {sku} {qty}. Reads and writes are in-memory and sub-millisecond. If Redis has problems, you fall back to a cookie or a degraded mode — the user might lose some in-flight state, but they can keep shopping. Postgres would have taken the whole cart experience down.

Multi-device sync also becomes trivial: every device reads the same Redis key.

The new problem is the anonymous user. A guest fills a cart keyed by session_id. When they log in, you now have two carts — cart:guest:{session_id} and cart:user:{user_id} — and you need a merge strategy.

The merge is a per-line-item union. For each SKU in the guest cart, if the user cart already has that SKU, you need a reconciliation rule; if it doesn't, you copy the guest item over.

The right rule is max-register merge on quantity — not "add quantities." A shopper who added 2 pairs of shoes on their phone and already has 1 pair in their saved cart probably wants 2, not 3. They changed their mind on quantity, not added a second purchase. So:

merged_qty(sku) = max(guest_qty, user_qty)   # most recent intent wins

This is structurally the same as a CRDT merge — take the max, which is monotonically non-decreasing, so concurrent edits from two devices can never cause a quantity to silently go backward. Product teams sometimes surface this as a UI prompt ("Your guest cart has item X (qty 2), your saved cart has qty 1 — keep 2?"), but as a background merge, taking the higher quantity is the conservative safe choice.

The remaining gap: checkout. The single-transaction approach from V1 can't span Redis, Postgres, and a payment processor.

V4: The checkout saga

Once you fan out across services, a single database transaction is no longer available to you. Checkout becomes a sequence of local transactions — reserve inventory, create order, charge payment — and any one of them can fail. If payment fails after inventory was reserved, you must release the reservation. This is the saga pattern: a sequence of forward steps with compensating transactions that undo each one if something goes wrong later.

flowchart LR
    V1["V1: SQL cart + checkout<br/>Simple, single DB"] --> V2["V2: Redis cart<br/>Available, fast reads"]
    V2 --> V3["V3: + guest merge<br/>max-register per-item reconciliation"]
    V3 --> V4["V4: Checkout saga<br/>Reserve → charge → confirm"]
    V4 --> V5["V5: + idempotency keys<br/>+ abandoned-cart TTL<br/>+ promotions engine"]
    style V1 fill:#0e7490,color:#fff
    style V3 fill:#15803d,color:#fff
    style V4 fill:#ff6b1a,color:#0a0a0f
    style V5 fill:#a855f7,color:#fff

High-level architecture

flowchart TD
    C[Client: browser / mobile] --> GW[API Gateway + Auth]
    GW --> CART[Cart Service]
    GW --> CHK[Checkout Service]

    CART --> REDIS[(Redis<br/>cart by user_id → items)]
    CART -.guest merge.-> REDIS

    CHK --> PRICE[Pricing + Promotions Engine]
    CHK --> INV[Inventory Service]
    CHK --> ORD[Order Service]
    CHK --> PAY[Payment Service]

    INV --> IDB[(Inventory DB<br/>Postgres — sharded by SKU)]
    ORD --> ODB[(Orders DB<br/>Postgres — sharded by user_id)]
    PAY --> PSP[Payment Processor<br/>Stripe / Adyen]

    CHK --> SAGA[Saga Coordinator]
    SAGA -.compensate.-> INV
    SAGA -.compensate.-> PAY
    SAGA --> KAFKA[Kafka: order events]
    KAFKA --> NOTIFY[Notification Service]
    KAFKA --> ANA[Analytics]

    style CART fill:#0e7490,color:#fff
    style REDIS fill:#15803d,color:#fff
    style CHK fill:#ff6b1a,color:#0a0a0f
    style SAGA fill:#a855f7,color:#fff
    style INV fill:#ffaa00,color:#0a0a0f
    style PAY fill:#ff2e88,color:#fff

The cart: available by design

Storage schema (Redis)

Each user's cart is a Redis hash. Line items are fields, quantities are values:

HSET cart:user:9182736 "SKU-SHOE-RED-10" "2"
HSET cart:user:9182736 "SKU-BELT-BLK-M"  "1"
HSET cart:user:9182736 "SKU-SOCK-WHT-10" "3"
EXPIRE cart:user:9182736 2592000   # 30-day TTL; refresh on activity

For guest sessions:

HSET cart:guest:sess_abc123 "SKU-SHOE-RED-10" "1"
EXPIRE cart:guest:sess_abc123 86400    # 1-day TTL for anonymous sessions

Why Redis over Postgres for the cart? The cart is read many times for every checkout — every page view, every ad re-targeting event, every "you left something in your cart" nudge. Reads outnumber checkouts by 50–100:1. Redis gives sub-millisecond reads, and its native hash operations make per-item updates atomic at the field level without locking.

Eventual consistency is acceptable here because the cart is not money. If two devices add the same item concurrently and one update is delayed, the worst outcome is a transient inconsistency in quantity — resolved on the next read. The authoritative quantity check happens at checkout, not in the cart. Losing an entire cart (data loss) is unacceptable; losing one concurrent update is not.

Guest-to-user cart merge

sequenceDiagram
    participant App
    participant CartSvc as Cart Service
    participant Redis
    App->>CartSvc: POST /login (user logs in)
    CartSvc->>Redis: HGETALL cart:guest:{session_id}
    Redis-->>CartSvc: guest cart items
    CartSvc->>Redis: HGETALL cart:user:{user_id}
    Redis-->>CartSvc: user cart items
    CartSvc->>CartSvc: merge: union of SKUs, max-register qty on conflict
    CartSvc->>Redis: HSET cart:user:{user_id} merged items
    CartSvc->>Redis: DEL cart:guest:{session_id}
    CartSvc-->>App: merged cart

The merge runs as a short Lua script on Redis to keep it atomic — no partial merges visible to other readers mid-flight. Note that Redis Lua scripts are atomic on a single node, but in Redis Cluster mode all keys accessed by the script must hash to the same slot. Use hash tags to guarantee co-location: cart:guest:{sess_abc} and cart:user:{9182736} — the curly-brace segment is the hash key, so you must ensure the guest and user cart keys share a tag (or run the merge on the node owning the user cart, or use a proxy that routes both keys to the same shard):

-- KEYS[1] = guest cart key, KEYS[2] = user cart key
local guest = redis.call('HGETALL', KEYS[1])
for i = 1, #guest, 2 do
  local sku = guest[i]
  local qty = tonumber(guest[i+1])
  local existing = tonumber(redis.call('HGET', KEYS[2], sku) or 0)
  if qty > existing then
    redis.call('HSET', KEYS[2], sku, qty)
  end
end
redis.call('DEL', KEYS[1])

Checkout: strongly consistent by requirement

Pricing and promotions at checkout

Never trust a price from the client. At checkout:

Fetch current prices for all SKUs from the Pricing Service.
Apply promotions (coupon codes, buy-2-get-1, category discounts) in a deterministic order.
Compute tax by shipping jurisdiction (typically a call to a tax service like TaxJar or Avalara).
Present the final total to the user for confirmation before capturing payment.

Many systems hold the computed price for a short window (15 minutes) so the user doesn't see it change mid-checkout if a sale ends while they're entering their card details.

Inventory reservation: atomic conditional decrement

The core operation that prevents oversell:

UPDATE inventory
SET    reserved = reserved + :qty,
       available = stock - reserved - :qty
WHERE  sku = :sku
AND    (stock - reserved) >= :qty   -- only succeeds if sufficient stock remains
RETURNING available;

If the UPDATE affects 0 rows, stock was insufficient — return an out-of-stock error. If it succeeds, the reservation is live. This single statement is atomic in Postgres (row-level locking on the sku row), so two concurrent checkouts for the last unit compete at the database level: exactly one wins.

The reservation carries a reserved_until timestamp. A background job (or Postgres pg_cron) sweeps expired reservations and restores available stock:

WITH released AS (
  UPDATE reservations
  SET    status = 'RELEASED'
  WHERE  expires_at < now()
  AND    status = 'PENDING'
  RETURNING sku, qty
)
UPDATE inventory
SET    reserved = reserved - released.qty
FROM   released
WHERE  inventory.sku = released.sku;

flowchart TD
    CHK[Checkout Service] -->|"UPDATE ... WHERE (stock - reserved) >= qty"| INVDB[(Inventory DB)]
    INVDB -->|0 rows updated| OOS[Return OUT_OF_STOCK]
    INVDB -->|1 row updated| RES[Reservation created<br/>reserved_until = now + TTL]
    RES --> SAGA[Continue saga]
    SWEEP[Background sweeper<br/>pg_cron every minute] -->|expired.expires_at < now| INVDB2[(Inventory DB)]
    INVDB2 -->|"reserved -= qty"| FREE[Stock returned to available]
    style CHK fill:#ff6b1a,color:#0a0a0f
    style INVDB fill:#15803d,color:#fff
    style INVDB2 fill:#15803d,color:#fff
    style OOS fill:#ff2e88,color:#fff
    style SWEEP fill:#0e7490,color:#fff

This is the same inventory-hold pattern used in flash sale systems — the "reserve now, release if not purchased" model that prevents oversell without holding stock permanently.

The checkout saga

The saga runs these steps in order, with compensations for each:

sequenceDiagram
    participant CHK as Checkout Service
    participant INV as Inventory Service
    participant ORD as Order Service
    participant PAY as Payment Service
    participant PSP as Payment Processor

    CHK->>INV: Reserve stock (conditional decrement, TTL ~10–15 min)
    INV-->>CHK: reservation_id or OUT_OF_STOCK
    CHK->>ORD: Create order (status=PENDING, reservation_id)
    ORD-->>CHK: order_id
    CHK->>PAY: Authorize payment (order_id, amount, idempotency_key)
    PAY->>PSP: Charge card
    PSP-->>PAY: auth_code or DECLINED
    PAY-->>CHK: auth_code or DECLINED

    alt Payment declined
        CHK->>ORD: Cancel order (status=CANCELLED)
        CHK->>INV: Release reservation (reservation_id)
    else Payment authorized
        CHK->>ORD: Confirm order (status=CONFIRMED)
        CHK->>INV: Commit reservation (convert reserved → sold)
        CHK->>PAY: Capture payment (auth_code)
        CHK-->>Client: order_id, confirmation
    end

Step	Forward action	Compensation on failure

Reserve inventory | Conditional decrement + TTL | Release reservation (reserved -= qty)
Create order record | INSERT order (PENDING) | UPDATE order status = CANCELLED
Authorize payment | PSP authorization call | Void authorization if already issued
Confirm order | UPDATE order status = CONFIRMED | (terminal; payment capture follows)
Commit reservation | UPDATE reserved → sold in inventory | Reverse commit (sold -= qty, reserved += qty)
Capture payment | PSP capture call | Refund if capture already processed

The saga can be implemented as orchestration (the Checkout Service drives each step synchronously and calls compensations on failure) or choreography (each service listens to events and publishes results). For checkout, orchestration is almost always cleaner. The Checkout Service needs to make decisions about which compensation to call given partial failure — choreography makes that hard to reason about when failures happen in either direction.

For deeper background on the pattern, see the saga pattern article and the payment system design.

Idempotency: protecting against double-clicks and retries

The client generates a UUID before submitting the checkout form — the idempotency key — and includes it in the request header:

POST /orders HTTP/1.1
Idempotency-Key: 550e8400-e29b-41d4-a716-446655440000
Content-Type: application/json

{ "cart_id": "...", "payment_method_id": "..." }

The Checkout Service records the (idempotency_key, result) in a durable table before returning:

CREATE TABLE idempotency_keys (
  key        UUID PRIMARY KEY,
  user_id    BIGINT NOT NULL,
  result     JSONB,
  created_at TIMESTAMPTZ DEFAULT now(),
  expires_at TIMESTAMPTZ DEFAULT now() + INTERVAL '24 hours'
);

On a duplicate request with the same key, the service returns the cached result immediately — no second charge, no second order. Here's what that lookup flow looks like in practice:

flowchart LR
    REQ[POST /orders<br/>Idempotency-Key: abc123] --> LOOKUP{Key in<br/>idempotency_keys?}
    LOOKUP -->|Yes| CACHED[Return cached result<br/>no charge, no order created]
    LOOKUP -->|No| SAGA[Run checkout saga]
    SAGA --> STORE[Store result in<br/>idempotency_keys]
    STORE --> RESP[Return result to client]
    style LOOKUP fill:#ff6b1a,color:#0a0a0f
    style CACHED fill:#15803d,color:#fff
    style SAGA fill:#a855f7,color:#fff

This covers three failure modes that would otherwise bite you. A double-click submits before the first response arrives — both requests carry the same key, the second one hits the cache. A network timeout causes the client to retry — same key, same cached response, no second charge. A mobile app reconnects after a brief disconnect and resubmits — again, same key.

The order state machine

stateDiagram-v2
    [*] --> PENDING : create order
    PENDING --> CONFIRMED : payment authorized + captured
    PENDING --> CANCELLED : payment declined or timeout
    CONFIRMED --> PROCESSING : warehouse picks order
    PROCESSING --> SHIPPED : fulfillment ships
    SHIPPED --> DELIVERED : delivery confirmed
    DELIVERED --> RETURN_REQUESTED : customer initiates return
    RETURN_REQUESTED --> REFUNDED : return approved + refund issued
    CONFIRMED --> CANCELLED : customer cancels before processing
    CANCELLED --> [*]
    REFUNDED --> [*]
    DELIVERED --> [*]

Every state transition emits a Kafka event. Downstream services (notifications, analytics, fulfillment, returns) consume those events rather than polling the Orders DB. The Order Service is the single source of truth for order status.

Storage choices

Data	Store	Reason
Cart items	Redis (AP)	Available, fast, trivially sharded by `user_id`
Inventory (available / reserved / sold counts)	Postgres (sharded by SKU range)	Needs strong consistency; conditional-write support; row-level locking
Reservations	Postgres (collocated with inventory)	Same transaction scope as inventory update
Orders	Postgres (sharded by `user_id` or `order_id`)	Strong consistency, rich queries, foreign keys
Idempotency keys	Postgres or Redis (with TTL)	Short-lived; Redis TTL is operationally simpler
Promotions / pricing	Postgres + read-through cache	Changes infrequently; cache with 60-second TTL
Order events	Kafka → S3 / data warehouse	Fan-out to notifications, analytics, fulfillment
Session / auth tokens	Redis	Short-lived, high-read

Failure modes

Oversell on hot SKUs

A limited-edition product launches and 5,000 users click "Buy Now" simultaneously. All 5,000 hit the inventory service with UPDATE ... WHERE (stock - reserved) >= 1. Postgres row-level lock on that SKU row serializes them — the first N succeed (N = stock), the rest get 0 rows updated and see "out of stock." The database is doing exactly the right thing.

The risk at extreme concurrency is lock contention: wait queues build up, checkout latency spikes. You can attack this from several directions. Pre-shard inventory by SKU range so a hot item doesn't contend with unrelated stock. Read available count from cache first and return "out of stock" early for obvious cases before touching the DB. Rate-limit checkout attempts per SKU at the API gateway. For true flash sales — think limited-edition sneakers with 10,000+ concurrent buyers — see design-flash-sale, which uses a Redis DECR counter as a first-pass gate before writing to Postgres.

Double-charge (payment-succeeded, order-failed)

The payment processor returns success, but the network drops before the Checkout Service receives the response. The service retries, and the customer gets charged twice.

The fix: pass the idempotency key to the payment processor's API. Stripe and Adyen both support this natively. The second call with the same key returns the first charge's result without issuing a new charge — the deduplication happens on their side, not just yours.

Payment succeeded but order not created (orphan payment)

The Create Order step fails after payment authorization — say, Postgres is temporarily unavailable. Now money is authorized but no order exists.

This is why authorization and capture are separate steps. The payment is only authorized at step 3, not captured until step 5 after the order is confirmed. An authorization that is never captured is voided automatically by the payment processor after a window that varies by card network and acquirer (Visa card-not-present transactions: 10 calendar days; Visa card-present: 5 calendar days; Mastercard final authorizations: 7 calendar days; specialized merchant preauthorizations such as lodging or vehicle rental: up to 30 days). If the system detects the failure in real time, the compensation step explicitly voids the authorization immediately rather than waiting for the network timeout.

Reservation leak (inventory held forever)

Checkout fails after reserving inventory, but the compensation message is lost because the saga coordinator crashes. Stock is permanently held.

This is why the reservation has a reserved_until TTL. The background sweeper unconditionally releases reservations past their TTL, regardless of whether the saga fired its compensation. The compensation path is the fast release; the sweeper is the safety net. Neither one alone is sufficient — together they give you defense in depth.

Cart data loss on Redis failure

When Redis is down, cart reads fail. The options, roughly in order of complexity:

Redis Sentinel or Cluster with replicas promotes a replica on primary failure with acceptable brief inconsistency — this is what most teams run.

Write-through to Postgres as a durable fallback: on Redis miss, read from Postgres; write to both on update. More complex to operate but zero data loss.

Cookie fallback embeds a truncated cart in a signed cookie. For small carts, this is enough for a degraded mode while Redis recovers.

Most production implementations use Redis Cluster (option 1) with Postgres write-through as a recovery path on reconnect, not as a live read path.

Abandoned cart reservation

A user reaches the payment step with inventory reserved, then abandons the browser tab. The reservation TTL — 15 minutes is a common choice — must be long enough for the user to complete payment but short enough that stock isn't tied up indefinitely. After TTL expiry, the sweeper releases the hold. If the user returns and tries to complete checkout after the TTL, they must re-reserve — which may fail if stock ran out while they were away.

Things to discuss in an interview

The available/consistent split: the cart is AP (Redis); checkout is CP (Postgres with conditional writes). Name this explicitly and justify the choice.
Max-register cart merge vs. additive merge: why you take the max quantity rather than summing, and the CRDT analogy.
The idempotency key flow: where it's generated, where it's stored, what it prevents — especially the double-charge scenario.
Saga orchestration vs. choreography: for checkout, orchestration is simpler to reason about; name the trade-off.
Inventory reservation TTL: why you use TTL rather than relying on compensations alone — defense in depth against saga coordinator failures.
Hot-SKU contention: how Postgres row locks serialize access, and when you need to escalate to a Redis-based pre-gate (flash sale pattern).
Price recomputation at checkout: why you can't trust client-sent prices, and the price-locking window.

Things you should now be able to answer

Why is the cart stored in Redis rather than Postgres, and what are the trade-offs?
What merge strategy do you use when a guest cart and a user cart have the same SKU?
How does the conditional UPDATE ... WHERE (stock - reserved) >= qty prevent overselling without a distributed lock?
What is the idempotency key, where does it come from, and what failure modes does it prevent?
Walk me through the compensation steps if payment authorization fails after inventory is reserved.
How does the reservation TTL protect against inventory leaks when the saga coordinator crashes?
What happens to a payment authorization if the order creation step fails?

Frequently asked questions

▸Why is the cart stored in Redis instead of Postgres?

Cart reads outnumber checkouts by 50-100:1, so sub-millisecond in-memory reads matter far more than strong consistency. Redis also keeps the cart available during partial outages where a Postgres primary failover would take the entire cart experience down. Eventual consistency is acceptable for the cart because the authoritative stock check happens at checkout, not at add-to-cart time.

▸How does the conditional inventory UPDATE prevent overselling without a distributed lock?

The Checkout Service runs a single Postgres statement: UPDATE inventory SET reserved = reserved + qty WHERE sku = :sku AND (stock - reserved) >= qty. If the UPDATE affects 0 rows, stock was insufficient and the caller gets an out-of-stock error. Because Postgres uses row-level locking on the SKU row, two concurrent checkouts for the last unit race at the database level and exactly one wins — no distributed lock required.

▸What merge strategy resolves a conflict when a guest cart and a user cart contain the same SKU?

The system applies a max-register merge on quantity using max(guest_qty, user_qty), not an additive sum. A shopper who set qty to 2 on their phone and has qty 1 in their saved cart most likely changed their mind rather than intending a combined order of 3. This is structurally equivalent to a CRDT max-register merge, which is monotonically non-decreasing and safe under concurrent edits.

▸What failure modes does the idempotency key on POST /orders prevent?

It prevents double-charges from three concrete scenarios: a double-click that submits before the first response arrives, a network timeout that causes the client to retry, and a mobile app that resubmits after a brief disconnect. The client generates a UUID before submitting, the Checkout Service stores the (key, result) pair in a durable table, and any duplicate request with the same key returns the cached result with no second order or charge created. The idempotency key is also passed to Stripe or Adyen so deduplication happens on the payment processor side as well.

▸Why does checkout use saga orchestration rather than choreography?

For the linear reserve-create-authorize-capture flow, the Checkout Service needs to make precise decisions about which compensation to call given partial failure — choreography makes that hard to reason about when failures happen in either direction. Orchestration keeps all the control logic in one place: the Checkout Service drives each step synchronously and explicitly calls compensations (release reservation, void authorization) on failure.

← previous

Design a Large-Scale Data Pipeline (ETL / Batch + Streaming)

Design a Social Graph Service (Facebook's TAO)

// RELATED