~/articles/design-digital-wallet

◆◆◆Advancedasked at PayPalasked at Blockasked at Paytmasked at Stripe

Design a Digital Wallet (PayPal / Venmo / Paytm)

Hold balances, transfer money between users instantly, and never lose a cent. Double-entry ledgers, idempotent transfers, and strong consistency.

21 min read2026-05-28Ironclad Academy

#interview #payments #consistency #ledger

// DEPTH

the full breakdown — requirements, capacity, evolution, trade-offs

The problem

PayPal moved $1.36 trillion in payment volume in 2023. Venmo settles peer-to-peer payments between friends in seconds. Paytm handles hundreds of millions of wallet transactions across India. At the core of each product is the same deceptively simple idea: a user holds a stored-value balance inside your system, and can move it to another user instantly. The engineering challenge is making "instantly" actually mean "correctly."

Financial systems are uniquely unforgiving. A bug in a URL shortener loses a click. A bug in a wallet creates or destroys real money — and the person whose money disappeared has a transaction statement, a lawyer, and a regulator ready to act on it. That asymmetry is why the wallet problem is the most consistency-critical design in the interview catalog.

Two forces make it hard. First, correctness under concurrency: when two transfers touch the same account simultaneously, a naïve read-modify-write will silently corrupt the balance. Second, exactly-once semantics: networks drop responses after the server has already committed. A client that retries a timed-out request should get the same outcome — not a second debit. Both problems must be solved at every layer: application, database, and external rail.

The design sits at the intersection of distributed-systems theory and decades-old double-entry accounting. A transfer is not one UPDATE to a balance column — it is two immutable ledger rows (debit and credit) inside one ACID transaction. The balance is derived, never mutated in isolation, so the ledger is always the source of truth and every cent is permanently auditable.

Functional requirements

POST /accounts — create a wallet account.
POST /topups — load funds from a card or bank (external rail).
POST /transfers — move money from one wallet account to another (intra-wallet, instant).
POST /withdrawals — push funds to a bank account (external rail, async).
GET /accounts/{id}/balance — return current balance.
GET /accounts/{id}/transactions — paginated ledger history.

Non-functional requirements

Correctness above all: no double-spend, no overdraft, no partial transfer, no lost update.
Strong consistency: balance reads must reflect all committed transfers.
Idempotency: any operation retried with the same key returns the same result.
Auditability: every cent must be traceable; ledger rows are immutable (append-only).
Availability: 99.99% for intra-wallet; external rails can degrade gracefully.
Latency: intra-wallet transfer p99 < 500ms; external rail acknowledgement < 2s (settlement is async).

Capacity estimation

Dimension	Estimate	How we got there
Accounts	200 M total; ~10 M active daily	Given
Transfer write TPS (avg)	~58 TPS	`5,000,000 ÷ 86,400 ≈ 58`
Transfer write TPS (peak)	~290 TPS	`58 × 5× peak factor`
Ledger rows per day	10 M rows/day	`5 M transfers × 2 rows` (one debit, one credit)
Ledger rows over 5 years	18.25 B rows	`10 M × 365 × 5`
Ledger storage over 5 years	~3.7 TB	`18.25 B × 200 B per row` (ids, amount, timestamps, metadata)
Balance reads (avg)	~580 reads/s	`~10× transfer rate`; served from materialized balance cache
External rail volume (avg)	~12 TPS	`~20% of transfer volume = 1 M/day ÷ 86,400`
External rail volume (peak)	~60 TPS	`12 × 5× peak factor`

Takeaway: 290 TPS writes and 3.7 TB over five years is solidly within reach of a small sharded Postgres cluster — money volumes are not Twitter scale.

Building up to the design

Start with the simplest thing that could work, then break it deliberately. Each version below teaches you one invariant that money systems require.

V1: A balance column per account

The naïve approach: add a balance column to the accounts table. A transfer is two UPDATE statements:

UPDATE accounts SET balance = balance - 100 WHERE id = :sender;
UPDATE accounts SET balance = balance + 100 WHERE id = :receiver;

This works for a demo. The immediate problem: if the process crashes between the two UPDATEs, money vanishes into thin air. You also have no history — if a user disputes a charge, you cannot reconstruct what happened. And a retried request runs both UPDATEs a second time, silently doubling the deduction.

V2: Wrap in a transaction + add a ledger table

Put both UPDATEs inside a single DB transaction and add a ledger table that records every entry as an immutable, append-only row. Now a crash inside the transaction rolls back both UPDATEs atomically, and you have an audit trail.

The balance column is still a problem, though. Under concurrent transfers from one account — say, a burst of outbound payments — multiple transactions race to read-modify-write the same row. Two concurrent transfers each read balance = 1000, both deduct 900, and both commit, leaving the balance at 100 instead of rejecting the second transfer for insufficient funds. The correct outcome is that the first transfer succeeds (leaving 100) and the second is refused — but both succeeded, so 1800 was sent while only 900 was debited, effectively creating money. This is the lost-update bug, and it cannot be solved at the application layer without help from the database.

V3: Row lock on sender + overdraft check inside the transaction

BEGIN;
SELECT balance FROM accounts WHERE id = :sender FOR UPDATE;
-- check: balance >= amount
UPDATE accounts SET balance = balance - :amount WHERE id = :sender;
INSERT INTO ledger ...;
COMMIT;

SELECT ... FOR UPDATE acquires a row-level exclusive lock before the read. Any concurrent transaction that attempts the same will block until the first commits, then re-read the updated balance. Overdraft is now impossible, and the lost-update race is gone.

The next gap is retries. If the client's network drops after the server commits but before the response arrives, the client retries — and the transfer runs twice.

V4: Idempotency keys

Clients generate a unique idempotency_key (a UUID or ULID) per intent. The server stores (idempotency_key → result) before returning. On retry:

if idempotency_store.get(key):
    return stored result   ← no second debit
else:
    execute transfer
    idempotency_store.set(key, result, ttl=7d)
    return result

The check-and-set must be atomic with the transfer, or another race emerges. In practice: write the idempotency record inside the same transaction as the ledger rows (both in Postgres), or use a database-level unique constraint on the key and catch the duplicate-key error on retry.

Safe retries are now possible under any network partition between client and server. But the balance column still diverges from the ledger under bugs, schema migrations, or direct DB repairs. We need a single source of truth.

V5: Remove the mutable balance column — derive balance from ledger

SELECT COALESCE(SUM(amount), 0) AS balance
FROM ledger
WHERE account_id = :id;

amount is signed: debits are negative, credits are positive. Balance is always re-derivable, forever auditable, and cannot silently drift. For performance, maintain a materialized running balance (a cached aggregate updated inside each transfer transaction), but treat the ledger as the authoritative source for reconciliation.

flowchart LR
    V1["V1: mutable balance column<br/>no history, no atomicity"] --> V2["V2: + transaction + ledger<br/>atomic, auditable"]
    V2 --> V3["V3: + SELECT FOR UPDATE<br/>no overdraft, no lost-update"]
    V3 --> V4["V4: + idempotency keys<br/>safe retries"]
    V4 --> V5["V5: ledger as source of truth<br/>balance derived, immutable history"]
    V5 --> V6["V6: + external rails + reconciliation<br/>bank top-ups, withdrawals, settlement"]
    style V1 fill:#0e7490,color:#fff
    style V3 fill:#15803d,color:#fff
    style V5 fill:#ff6b1a,color:#0a0a0f
    style V6 fill:#a855f7,color:#fff

V6: External rails and reconciliation

Intra-wallet transfers are simple — both accounts live in our database. Top-ups and withdrawals cross into external systems (card networks, ACH, SWIFT) that have their own latency (standard ACH: 1–2 business days; same-day ACH available; card settlement: T+1 to T+2), failure modes, and settlement semantics. These require an async, idempotent, reconciled flow — covered in the external rails section below.

The double-entry ledger

Every monetary event produces exactly two ledger rows: one debit and one credit. The sum of all rows for any account is its balance. The sum of all rows in the entire system is always zero — money is conserved — but only if every external funding event (top-up, withdrawal) is also recorded against a corresponding system-side liability or float account. A top-up from a bank credits the user's account and debits a "funds-in-transit" or "external liability" account; once the bank confirms settlement, the liability account is cleared. Without this complete chart of accounts, intra-wallet transfers sum to zero but external cash flows do not.

CREATE TABLE ledger (
    id            BIGSERIAL     PRIMARY KEY,
    transfer_id   UUID          NOT NULL,       -- groups the debit+credit pair
    account_id    BIGINT        NOT NULL,
    amount        NUMERIC(18,2) NOT NULL,        -- negative = debit, positive = credit
    currency      CHAR(3)       NOT NULL,
    entry_type    TEXT          NOT NULL,        -- 'DEBIT' | 'CREDIT'
    created_at    TIMESTAMPTZ   NOT NULL DEFAULT now(),
    metadata      JSONB
) PARTITION BY RANGE (created_at);

CREATE INDEX ledger_account_time ON ledger(account_id, created_at DESC);
CREATE UNIQUE INDEX ledger_transfer_type ON ledger(transfer_id, entry_type);

The UNIQUE INDEX on (transfer_id, entry_type) is a natural idempotency guard: inserting the same transfer twice raises a duplicate-key error, which the application catches and handles by returning the original result.

A transfer for $50 from Alice (account 1) to Bob (account 2):

id	transfer_id	account_id	amount	entry_type
101	txn-abc	1	-50.00	DEBIT
102	txn-abc	2	+50.00	CREDIT

Both rows or neither. The ACID transaction guarantees it.

Transfers — the happy path

sequenceDiagram
    participant C as Client
    participant WS as Wallet Service
    participant IDEM as Idempotency Store
    participant DB as Ledger DB

    C->>WS: POST /transfers {from, to, amount, idempotency_key}
    WS->>IDEM: get(idempotency_key)
    alt already exists
        IDEM-->>WS: cached result
        WS-->>C: 200 OK (same result, no debit)
    else new request
        WS->>DB: BEGIN TRANSACTION
        WS->>DB: SELECT balance FROM accounts WHERE id=:from FOR UPDATE
        DB-->>WS: balance = 1000
        WS->>WS: check amount (50) <= balance (1000)
        WS->>DB: INSERT INTO ledger (debit row for sender)
        WS->>DB: INSERT INTO ledger (credit row for receiver)
        WS->>DB: INSERT INTO idempotency_records (key, result)
        WS->>DB: COMMIT
        WS-->>C: 200 OK {transfer_id, ...}
    end

A few things worth noticing in this flow. The FOR UPDATE lock is held only for the duration of the transaction — milliseconds — so throughput stays high. The idempotency record goes into the same transaction as the ledger rows: if the transaction rolls back (overdraft, DB error), the record rolls back too, meaning a retry after a legitimate failure is correctly re-executed rather than being swallowed as a duplicate. After commit, the service publishes an event to the notification pipeline so both users get a push notification — fire-and-forget, so a notification failure never rolls back a committed transfer.

Transfer state machine

External transfers (top-ups and withdrawals) go through a multi-step lifecycle because the external rail is async and can fail independently.

stateDiagram-v2
    [*] --> PENDING: request accepted
    PENDING --> PROCESSING: sent to external rail
    PROCESSING --> COMPLETED: rail confirms settlement
    PROCESSING --> FAILED: rail rejects / timeout
    FAILED --> PENDING: retry (idempotent re-send)
    COMPLETED --> [*]
    FAILED --> REVERSED: manual or auto reversal
    REVERSED --> [*]
    PENDING --> CANCELLED: user cancels before processing
    CANCELLED --> [*]

For intra-wallet transfers, the state machine collapses to PENDING → COMPLETED (or FAILED) inside a single DB transaction. There is no PROCESSING state because there is no external rail.

Concurrency deep-dive

Pessimistic locking (default)

SELECT ... FOR UPDATE is straightforward and correct. The tradeoff: under high concurrency on a single account — say, a merchant account receiving hundreds of payments per second — lock contention becomes the bottleneck. Transactions queue up waiting for the lock, and tail latency climbs.

Optimistic concurrency (alternative)

Add a version column to the accounts table. A transfer reads the version, does its work, then updates with a WHERE version = :read_version condition. If another transaction committed between the read and the write, the UPDATE affects 0 rows — the application detects this and retries.

-- Read phase
SELECT balance, version FROM accounts WHERE id = :id;
-- balance = 1000, version = 7

-- Write phase (inside transaction)
UPDATE accounts SET balance = 900, version = 8
WHERE id = :id AND version = 7;
-- affected rows = 0 → conflict, retry

Optimistic concurrency reduces lock contention under low-to-medium concurrency, but retry storms under high contention waste more work than they save. Pessimistic locking is the safer default for financial systems where conflicts are common.

Serialized command queue (hot accounts)

For truly hot accounts — a platform escrow account receiving thousands of payments per second — neither locking strategy scales well. Route all commands for a given account through a single-writer queue (a Kafka partition keyed by account_id). A single consumer processes commands for that account sequentially, eliminating all concurrency. This caps throughput at the consumer's single-thread speed but eliminates all lock contention and, when combined with idempotent command handling, is effectively exactly-once.

flowchart LR
    T1[Transfer 1] --> Q[("Kafka partition<br/>key=account_id")]
    T2[Transfer 2] --> Q
    T3[Transfer 3] --> Q
    Q --> CONS[Single consumer<br/>sequential writes]
    CONS --> DB[(Ledger DB)]
    style Q fill:#a855f7,color:#fff
    style CONS fill:#ff6b1a,color:#0a0a0f
    style DB fill:#0e7490,color:#fff

External rails: top-up and withdrawal

Intra-wallet transfers live entirely inside our database and can be ACID. Top-ups and withdrawals cross into the external financial system, which is async (standard ACH takes 1–2 business days; approximately 80% of volume settles in one day or less per Nacha; same-day ACH is available for eligible transfers), fallible, and idempotent by convention on most rails.

The pattern:

sequenceDiagram
    participant C as Client
    participant WS as Wallet Service
    participant DB as Ledger DB
    participant RAIL as External Rail (ACH/Card)
    participant RECON as Reconciliation Service

    C->>WS: POST /topups {amount, source, idempotency_key}
    WS->>DB: create top-up record (status=PENDING)
    WS-->>C: 202 Accepted {topup_id}
    WS->>RAIL: initiate debit (idempotent, with external_key)
    RAIL-->>WS: initiated
    note over RAIL: hours to days later...
    RAIL->>RECON: settlement file / webhook
    RECON->>DB: mark top-up COMPLETED, insert credit ledger row
    RECON->>WS: trigger balance update notification

The wallet does not credit the user's balance until the external rail confirms settlement. Optimistic crediting (credit immediately, reverse on failure) is a product decision with fraud risk; most wallets credit on authorization for UX but hold the funds until settlement for risk. Every call to the external rail carries an idempotency key (typically a UUID derived from the internal topup_id) so a timed-out retry deduplicates cleanly on the rail side. The reconciliation job compares the wallet's ledger against the bank's settlement file daily — any discrepancy is flagged for investigation.

Architecture

flowchart TD
    C[Client app] --> GW[API Gateway<br/>auth + rate limit]
    GW --> WS[Wallet Service]

    WS --> IDEM[(Idempotency Store<br/>Postgres unique key)]
    WS --> LDB[(Ledger DB<br/>Postgres, sharded by account_id)]
    WS --> BAL[(Balance Cache<br/>Redis, account_id → balance+version)]
    WS --> KAFKA[Kafka<br/>transfer events]

    KAFKA --> NOTIF[Notification Service<br/>push / email]
    KAFKA --> AUDIT[Audit Log Service<br/>append-only store]
    KAFKA --> FRAUD[Fraud Service<br/>async scoring]

    WS --> EXTQ[External Rail Queue<br/>Kafka / SQS]
    EXTQ --> RAILSVC[Rail Service<br/>ACH / card processor]
    RAILSVC --> BANK[Bank / Card Network]
    BANK -.settlement.-> RECON[Reconciliation Service]
    RECON --> LDB

    style WS fill:#ff6b1a,color:#0a0a0f
    style LDB fill:#0e7490,color:#fff
    style BAL fill:#15803d,color:#fff
    style KAFKA fill:#a855f7,color:#fff
    style RECON fill:#ffaa00,color:#0a0a0f

Storage choices

Data	Store	Rationale
Ledger rows	Postgres (range-partitioned by date)	ACID, foreign keys, SQL aggregates; partitioning keeps hot partitions small
Account metadata	Postgres	Consistent reads alongside ledger
Idempotency records	Postgres (same DB, unique constraint)	Shares the transfer transaction; avoids distributed 2PC
Balance cache	Redis (account_id → {balance, version})	Sub-ms reads; invalidated on any committed transfer touching that account
Transfer events	Kafka	Fan-out to notifications, fraud, audit without coupling services
Audit log	Append-only store (S3 + Athena, or immutable Postgres partition)	Regulatory compliance; never deleted
External rail state	Postgres (transfers + status)	Reconciliation joins against ledger

Preventing overdraft — the complete picture

There are three layers, each catching a different failure mode:

flowchart TD
    REQ[Transfer Request] --> APP[1. Application layer<br/>read balance cache, reject fast]
    APP -->|"balance looks sufficient"| DB[2. Database layer<br/>SELECT FOR UPDATE, authoritative check]
    DB -->|"balance confirmed"| CON[3. Constraint layer<br/>CHECK balance >= 0]
    CON -->|"all clear"| COMMIT[Commit ledger rows]
    APP -->|"obviously insufficient"| REJ1[Reject 400]
    DB -->|"insufficient after lock"| REJ2[Reject 400 — rollback]
    CON -->|"bug caught"| REJ3[Reject — rollback]
    style APP fill:#ffaa00,color:#0a0a0f
    style DB fill:#ff6b1a,color:#0a0a0f
    style CON fill:#0e7490,color:#fff
    style COMMIT fill:#15803d,color:#fff
    style REJ1 fill:#ff2e88,color:#fff
    style REJ2 fill:#ff2e88,color:#fff
    style REJ3 fill:#ff2e88,color:#fff

The application layer reads the balance from cache and rejects obviously insufficient requests before touching the database. The database layer is the authoritative check — inside the transaction, SELECT balance FOR UPDATE, compare with the amount, raise an error if insufficient. The cache can be stale, so this is the only layer that counts. The constraint layer is the backstop: a CHECK (balance >= 0) constraint on the accounts table, or a trigger that rejects any ledger insert that would make the running total negative. This catches bugs in the application layer. All three must agree; the DB constraint is the final defense.

Balance derivation vs. materialized balance

The fully-derived approach queries the ledger on every balance read:

SELECT SUM(amount) FROM ledger WHERE account_id = :id;

This is the cleanest design — single source of truth, no cache invalidation bugs. The downside is speed: for accounts with millions of ledger rows, this query grows expensive. You can bound it with a starting checkpoint, but it adds complexity.

The materialized approach keeps a balance column in the accounts table, updated atomically inside each transfer transaction. On any discrepancy, the ledger wins — reconciliation re-derives and corrects. Most production wallets use the materialized approach for read performance and reserve the fully-derived query for reconciliation, dispute resolution, and periodic audits. The ledger is the ground truth; the column is just a fast approximation of it.

Sharding the ledger

At 290 TPS and 18 B rows over 5 years, a single Postgres instance works but a shard boundary at ~500 GB is prudent.

Strategy	Pros	Cons
Shard by `account_id`	All rows for one account on one shard; balance queries are local	Hot accounts (merchants) create hot shards
Shard by `transfer_id`	Even distribution	A single transfer's debit and credit may be on different shards; cross-shard reads for account history
Range partition by date (single shard)	Simple; old partitions can be archived to cold storage	All writes go to the current partition; partition becomes hot
Hybrid: shard by account_id bucket, range partition within shard	Good balance; cross-shard balance aggregation only for reconciliation	More operational complexity

The recommendation for most wallets at this scale: shard by hashed account_id (consistent hashing to ~16 shards), with range partitioning by date within each shard. Each shard handles ~18 TPS peak writes — trivial.

See consistent hashing for the sharding mechanics.

Failure modes and mitigations

Failure	Symptom	Mitigation
Double-spend on retry	Transfer debits sender twice	Idempotency key + unique constraint on `(transfer_id, entry_type)` in ledger
Lost update under concurrency	Two transfers each read old balance; both succeed; sender's balance goes negative	`SELECT ... FOR UPDATE` inside transaction
Partial transfer (crash between debit and credit)	Money disappears	ACID transaction: both rows or neither
Balance drift (cache out of sync with ledger)	Stale balance shown; overdraft possible	Invalidate cache on every committed write; reconciliation re-derives nightly
External rail timeout (top-up neither confirmed nor rejected)	Wallet neither credits nor debits	Idempotent retry to rail; reconciliation catches gaps; pending state with TTL
Hot account contention	p99 transfer latency spikes on popular merchant accounts	Serialized command queue per account (Kafka partition by account_id)
Fraudulent burst transfers	Account drained	Fraud service consumes Kafka events; async velocity rules; hard limit in application layer

Idempotency — the full contract

An idempotency key has a contract worth spelling out clearly. One key equals one intent — clients must generate a fresh key for a genuinely new transfer. Keys expire after a window (typically 7–30 days); after expiry, the key is recycled and a new transfer with the same key starts fresh. The server stores the full response (status code, transfer_id, timestamp) keyed by the idempotency key, so retries receive the exact same response. If two different clients send the same key with different parameters (amount, recipient), the server rejects the second with 409 Conflict — the key must be globally unique per client (UUID v4 is sufficient). And critically, the idempotency record is written inside the transfer transaction: if the transaction rolls back, the record rolls back too, so a retry after a legitimate failure is correctly re-attempted rather than silently short-circuited.

Reconciliation

Reconciliation is the financial system's immune system. It runs periodically (daily is standard, hourly for high-value systems) and answers: "Does the sum of all ledger rows for all accounts equal the sum of all external settlements?"

For each account:
    ledger_balance = SUM(ledger.amount) WHERE account_id = X
    expected_balance = last_reconciled_balance
                     + SUM(confirmed_external_credits)
                     - SUM(confirmed_external_debits)

    if ledger_balance != expected_balance:
        flag for manual review

Discrepancies reveal bugs (double-credit, missing debit), external rail errors (bank credited wrong amount), or fraud. Reconciliation is the only safe way to detect silent data corruption.

Intra-wallet vs. payment processing

This article covers a stored-value wallet: users hold a balance that lives entirely inside your system. Intra-wallet transfers are cheap, instant, and ACID because you control both sides.

The design-payment-system article covers card processing: the user has no balance in your system, money moves via external card rails (Visa/Mastercard), and the hard problems are authorization, clearing, settlement, and chargeback handling. The mechanics differ significantly — card processing requires online authorization against the issuing bank in real time; wallet transfers are a local DB transaction.

Many real products combine both: a wallet balance funded by card top-ups, with intra-wallet transfers for peer-to-peer and card rails for merchant payments.

Things to discuss in an interview

Why double-entry? It is self-auditing: the sum of all entries for any closed set of accounts is always zero. Intra-wallet transfers trivially sum to zero. External cash flows (top-ups, withdrawals) must be recorded against system-side liability or float accounts to preserve the invariant across the full chart of accounts. Any single-entry mutation can silently drift; double-entry makes errors visible at reconciliation.
Idempotency placement: why the idempotency record must live in the same transaction as the ledger rows, not in a separate Redis write.
Pessimistic vs. optimistic locking: when does each break down, and why is pessimistic safer for financial systems?
Derived vs. materialized balance: performance vs. correctness trade-off; how reconciliation catches drift.
External rail async flow: pending → completed → settled, and what happens on timeout.
Sharding money: cross-shard transfers require distributed transactions or a saga pattern — why you prefer to keep related accounts on the same shard.
Currency handling: always use NUMERIC(18,2) (or BIGINT in minor units such as cents/paise if you prefer integer storage) — never FLOAT (floating-point arithmetic is not safe for money). This schema uses NUMERIC(18,2) with signed amounts; if you switch to integer minor units, change the column to BIGINT and update sample values accordingly.

Things you should now be able to answer

Why does a digital wallet use a ledger table instead of a balance column?
What is the double-entry accounting invariant, and how does it enable reconciliation?
How does SELECT ... FOR UPDATE prevent overdraft under concurrent transfers?
Why must the idempotency record be written in the same transaction as the ledger rows?
What happens when an ACH top-up times out, and how does the system recover?
What is the difference between an intra-wallet transfer and a card payment, architecturally?
How would you handle a merchant account receiving 5000 transfers per second?

Frequently asked questions

▸Why does a digital wallet use a double-entry ledger instead of a mutable balance column?

A mutable balance column has no history, cannot be audited, and silently drifts under bugs or direct DB repairs. A double-entry ledger records every transfer as two immutable rows — one debit, one credit — inside a single ACID transaction, so the balance is always re-derivable from history and every cent is permanently traceable. The sum of all ledger rows for a closed set of accounts is always zero, which makes discrepancies visible at reconciliation.

▸How does SELECT FOR UPDATE prevent overdraft under concurrent transfers?

Without a lock, two concurrent transfers can each read the same stale balance, both pass the sufficiency check, and both commit — silently creating money. SELECT FOR UPDATE acquires a row-level exclusive lock on the sender's account row before the balance check; any competing transaction blocks until the first commits and then re-reads the updated balance, so the check and the ledger write are one atomic unit and an overdraft is impossible.

▸Why must an idempotency record be written inside the same transaction as the ledger rows, not in a separate Redis write?

If the idempotency record is written separately — say, to Redis after the DB transaction commits — a crash between commit and the Redis write leaves no record, and the next retry executes a second transfer. Storing the idempotency record inside the same Postgres transaction means it rolls back with the transfer on any failure, so a retry after a legitimate failure is correctly re-executed rather than silently short-circuited as a duplicate.

▸How long does ACH settlement take, and how does the wallet handle the gap?

Standard ACH settles in 1 to 2 business days; same-day ACH is available for eligible transfers. The wallet creates a top-up record in PENDING state and returns 202 Accepted immediately. It credits the user's balance only after the reconciliation service receives settlement confirmation from the bank, converting the record to COMPLETED and inserting the credit ledger row. Any timeout triggers an idempotent retry to the rail; the reconciliation job catches gaps daily.

▸When should you use a serialized command queue instead of pessimistic locking for a wallet account?

For ordinary accounts, SELECT FOR UPDATE is the correct default — the lock is held for milliseconds and throughput stays high. For truly hot accounts such as a platform escrow receiving thousands of payments per second, lock contention causes tail latency spikes under both pessimistic and optimistic strategies. Routing all commands for that account through a single Kafka partition keyed by account_id and processing them with a single consumer eliminates lock contention entirely, though it caps throughput to single-thread speed.

← previous

Design a Globally-Distributed SQL Database (Spanner / CockroachDB)

Design an Object Storage Service (S3)

// RELATED