Design a Digital Wallet (PayPal / Venmo / Paytm)
Hold balances, transfer money between users instantly, and never lose a cent. Double-entry ledgers, idempotent transfers, and strong consistency.
The problem
PayPal moved $1.36 trillion in payment volume in 2023. Venmo settles peer-to-peer payments between friends in seconds. Paytm handles hundreds of millions of wallet transactions across India. At the core of each product is the same deceptively simple idea: a user holds a stored-value balance inside your system, and can move it to another user instantly. The engineering challenge is making "instantly" actually mean "correctly."
Financial systems are uniquely unforgiving. A bug in a URL shortener loses a click. A bug in a wallet creates or destroys real money — and the person whose money disappeared has a transaction statement, a lawyer, and a regulator ready to act on it. That asymmetry is why the wallet problem is the most consistency-critical design in the interview catalog.
Two forces make it hard. First, correctness under concurrency: when two transfers touch the same account simultaneously, a naïve read-modify-write will silently corrupt the balance. Second, exactly-once semantics: networks drop responses after the server has already committed. A client that retries a timed-out request should get the same outcome — not a second debit. Both problems must be solved at every layer: application, database, and external rail.
The design sits at the intersection of distributed-systems theory and decades-old double-entry accounting. A transfer is not one UPDATE to a balance column — it is two immutable ledger rows (debit and credit) inside one ACID transaction. The balance is derived, never mutated in isolation, so the ledger is always the source of truth and every cent is permanently auditable.
Functional requirements
POST /accounts— create a wallet account.POST /topups— load funds from a card or bank (external rail).POST /transfers— move money from one wallet account to another (intra-wallet, instant).POST /withdrawals— push funds to a bank account (external rail, async).GET /accounts/{id}/balance— return current balance.GET /accounts/{id}/transactions— paginated ledger history.
Non-functional requirements
- Correctness above all: no double-spend, no overdraft, no partial transfer, no lost update.
- Strong consistency: balance reads must reflect all committed transfers.
- Idempotency: any operation retried with the same key returns the same result.
- Auditability: every cent must be traceable; ledger rows are immutable (append-only).
- Availability: 99.99% for intra-wallet; external rails can degrade gracefully.
- Latency: intra-wallet transfer p99 < 500ms; external rail acknowledgement < 2s (settlement is async).
Capacity estimation
| Dimension | Estimate | How we got there |
|---|---|---|
| Accounts | 200 M total; ~10 M active daily | Given |
| Transfer write TPS (avg) | ~58 TPS | 5,000,000 ÷ 86,400 ≈ 58 |
| Transfer write TPS (peak) | ~290 TPS | 58 × 5× peak factor |
| Ledger rows per day | 10 M rows/day | 5 M transfers × 2 rows (one debit, one credit) |
| Ledger rows over 5 years | 18.25 B rows | 10 M × 365 × 5 |
| Ledger storage over 5 years | ~3.7 TB | 18.25 B × 200 B per row (ids, amount, timestamps, metadata) |
| Balance reads (avg) | ~580 reads/s | ~10× transfer rate; served from materialized balance cache |
| External rail volume (avg) | ~12 TPS | ~20% of transfer volume = 1 M/day ÷ 86,400 |
| External rail volume (peak) | ~60 TPS | 12 × 5× peak factor |
Takeaway: 290 TPS writes and 3.7 TB over five years is solidly within reach of a small sharded Postgres cluster — money volumes are not Twitter scale.
Building up to the design
Start with the simplest thing that could work, then break it deliberately. Each version below teaches you one invariant that money systems require.
V1: A balance column per account
The naïve approach: add a balance column to the accounts table. A transfer is two UPDATE statements:
UPDATE accounts SET balance = balance - 100 WHERE id = :sender;
UPDATE accounts SET balance = balance + 100 WHERE id = :receiver;
This works for a demo. The immediate problem: if the process crashes between the two UPDATEs, money vanishes into thin air. You also have no history — if a user disputes a charge, you cannot reconstruct what happened. And a retried request runs both UPDATEs a second time, silently doubling the deduction.
V2: Wrap in a transaction + add a ledger table
Put both UPDATEs inside a single DB transaction and add a ledger table that records every entry as an immutable, append-only row. Now a crash inside the transaction rolls back both UPDATEs atomically, and you have an audit trail.
The balance column is still a problem, though. Under concurrent transfers from one account — say, a burst of outbound payments — multiple transactions race to read-modify-write the same row. Two concurrent transfers each read balance = 1000, both deduct 900, and both commit, leaving the balance at 100 instead of rejecting the second transfer for insufficient funds. The correct outcome is that the first transfer succeeds (leaving 100) and the second is refused — but both succeeded, so 1800 was sent while only 900 was debited, effectively creating money. This is the lost-update bug, and it cannot be solved at the application layer without help from the database.
V3: Row lock on sender + overdraft check inside the transaction
BEGIN;
SELECT balance FROM accounts WHERE id = :sender FOR UPDATE;
-- check: balance >= amount
UPDATE accounts SET balance = balance - :amount WHERE id = :sender;
INSERT INTO ledger ...;
COMMIT;
SELECT ... FOR UPDATE acquires a row-level exclusive lock before the read. Any concurrent transaction that attempts the same will block until the first commits, then re-read the updated balance. Overdraft is now impossible, and the lost-update race is gone.
The next gap is retries. If the client's network drops after the server commits but before the response arrives, the client retries — and the transfer runs twice.
V4: Idempotency keys
Clients generate a unique idempotency_key (a UUID or ULID) per intent. The server stores (idempotency_key → result) before returning. On retry:
if idempotency_store.get(key):
return stored result ← no second debit
else:
execute transfer
idempotency_store.set(key, result, ttl=7d)
return result
The check-and-set must be atomic with the transfer, or another race emerges. In practice: write the idempotency record inside the same transaction as the ledger rows (both in Postgres), or use a database-level unique constraint on the key and catch the duplicate-key error on retry.
Safe retries are now possible under any network partition between client and server. But the balance column still diverges from the ledger under bugs, schema migrations, or direct DB repairs. We need a single source of truth.
V5: Remove the mutable balance column — derive balance from ledger
SELECT COALESCE(SUM(amount), 0) AS balance
FROM ledger
WHERE account_id = :id;
amount is signed: debits are negative, credits are positive. Balance is always re-derivable, forever auditable, and cannot silently drift. For performance, maintain a materialized running balance (a cached aggregate updated inside each transfer transaction), but treat the ledger as the authoritative source for reconciliation.
flowchart LR
V1["V1: mutable balance column<br/>no history, no atomicity"] --> V2["V2: + transaction + ledger<br/>atomic, auditable"]
V2 --> V3["V3: + SELECT FOR UPDATE<br/>no overdraft, no lost-update"]
V3 --> V4["V4: + idempotency keys<br/>safe retries"]
V4 --> V5["V5: ledger as source of truth<br/>balance derived, immutable history"]
V5 --> V6["V6: + external rails + reconciliation<br/>bank top-ups, withdrawals, settlement"]
style V1 fill:#0e7490,color:#fff
style V3 fill:#15803d,color:#fff
style V5 fill:#ff6b1a,color:#0a0a0f
style V6 fill:#a855f7,color:#fff
V6: External rails and reconciliation
Intra-wallet transfers are simple — both accounts live in our database. Top-ups and withdrawals cross into external systems (card networks, ACH, SWIFT) that have their own latency (standard ACH: 1–2 business days; same-day ACH available; card settlement: T+1 to T+2), failure modes, and settlement semantics. These require an async, idempotent, reconciled flow — covered in the external rails section below.
The double-entry ledger
Every monetary event produces exactly two ledger rows: one debit and one credit. The sum of all rows for any account is its balance. The sum of all rows in the entire system is always zero — money is conserved — but only if every external funding event (top-up, withdrawal) is also recorded against a corresponding system-side liability or float account. A top-up from a bank credits the user's account and debits a "funds-in-transit" or "external liability" account; once the bank confirms settlement, the liability account is cleared. Without this complete chart of accounts, intra-wallet transfers sum to zero but external cash flows do not.
CREATE TABLE ledger (
id BIGSERIAL PRIMARY KEY,
transfer_id UUID NOT NULL, -- groups the debit+credit pair
account_id BIGINT NOT NULL,
amount NUMERIC(18,2) NOT NULL, -- negative = debit, positive = credit
currency CHAR(3) NOT NULL,
entry_type TEXT NOT NULL, -- 'DEBIT' | 'CREDIT'
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
metadata JSONB
) PARTITION BY RANGE (created_at);
CREATE INDEX ledger_account_time ON ledger(account_id, created_at DESC);
CREATE UNIQUE INDEX ledger_transfer_type ON ledger(transfer_id, entry_type);
The UNIQUE INDEX on (transfer_id, entry_type) is a natural idempotency guard: inserting the same transfer twice raises a duplicate-key error, which the application catches and handles by returning the original result.
A transfer for $50 from Alice (account 1) to Bob (account 2):
| id | transfer_id | account_id | amount | entry_type |
|---|---|---|---|---|
| 101 | txn-abc | 1 | -50.00 | DEBIT |
| 102 | txn-abc | 2 | +50.00 | CREDIT |
Both rows or neither. The ACID transaction guarantees it.
Transfers — the happy path
sequenceDiagram
participant C as Client
participant WS as Wallet Service
participant IDEM as Idempotency Store
participant DB as Ledger DB
C->>WS: POST /transfers {from, to, amount, idempotency_key}
WS->>IDEM: get(idempotency_key)
alt already exists
IDEM-->>WS: cached result
WS-->>C: 200 OK (same result, no debit)
else new request
WS->>DB: BEGIN TRANSACTION
WS->>DB: SELECT balance FROM accounts WHERE id=:from FOR UPDATE
DB-->>WS: balance = 1000
WS->>WS: check amount (50) <= balance (1000)
WS->>DB: INSERT INTO ledger (debit row for sender)
WS->>DB: INSERT INTO ledger (credit row for receiver)
WS->>DB: INSERT INTO idempotency_records (key, result)
WS->>DB: COMMIT
WS-->>C: 200 OK {transfer_id, ...}
end
A few things worth noticing in this flow. The FOR UPDATE lock is held only for the duration of the transaction — milliseconds — so throughput stays high. The idempotency record goes into the same transaction as the ledger rows: if the transaction rolls back (overdraft, DB error), the record rolls back too, meaning a retry after a legitimate failure is correctly re-executed rather than being swallowed as a duplicate. After commit, the service publishes an event to the notification pipeline so both users get a push notification — fire-and-forget, so a notification failure never rolls back a committed transfer.
Transfer state machine
External transfers (top-ups and withdrawals) go through a multi-step lifecycle because the external rail is async and can fail independently.
stateDiagram-v2
[*] --> PENDING: request accepted
PENDING --> PROCESSING: sent to external rail
PROCESSING --> COMPLETED: rail confirms settlement
PROCESSING --> FAILED: rail rejects / timeout
FAILED --> PENDING: retry (idempotent re-send)
COMPLETED --> [*]
FAILED --> REVERSED: manual or auto reversal
REVERSED --> [*]
PENDING --> CANCELLED: user cancels before processing
CANCELLED --> [*]
For intra-wallet transfers, the state machine collapses to PENDING → COMPLETED (or FAILED) inside a single DB transaction. There is no PROCESSING state because there is no external rail.
Concurrency deep-dive
Pessimistic locking (default)
SELECT ... FOR UPDATE is straightforward and correct. The tradeoff: under high concurrency on a single account — say, a merchant account receiving hundreds of payments per second — lock contention becomes the bottleneck. Transactions queue up waiting for the lock, and tail latency climbs.
Optimistic concurrency (alternative)
Add a version column to the accounts table. A transfer reads the version, does its work, then updates with a WHERE version = :read_version condition. If another transaction committed between the read and the write, the UPDATE affects 0 rows — the application detects this and retries.
-- Read phase
SELECT balance, version FROM accounts WHERE id = :id;
-- balance = 1000, version = 7
-- Write phase (inside transaction)
UPDATE accounts SET balance = 900, version = 8
WHERE id = :id AND version = 7;
-- affected rows = 0 → conflict, retry
Optimistic concurrency reduces lock contention under low-to-medium concurrency, but retry storms under high contention waste more work than they save. Pessimistic locking is the safer default for financial systems where conflicts are common.
Serialized command queue (hot accounts)
For truly hot accounts — a platform escrow account receiving thousands of payments per second — neither locking strategy scales well. Route all commands for a given account through a single-writer queue (a Kafka partition keyed by account_id). A single consumer processes commands for that account sequentially, eliminating all concurrency. This caps throughput at the consumer's single-thread speed but eliminates all lock contention and, when combined with idempotent command handling, is effectively exactly-once.
flowchart LR
T1[Transfer 1] --> Q[("Kafka partition<br/>key=account_id")]
T2[Transfer 2] --> Q
T3[Transfer 3] --> Q
Q --> CONS[Single consumer<br/>sequential writes]
CONS --> DB[(Ledger DB)]
style Q fill:#a855f7,color:#fff
style CONS fill:#ff6b1a,color:#0a0a0f
style DB fill:#0e7490,color:#fff
External rails: top-up and withdrawal
Intra-wallet transfers live entirely inside our database and can be ACID. Top-ups and withdrawals cross into the external financial system, which is async (standard ACH takes 1–2 business days; approximately 80% of volume settles in one day or less per Nacha; same-day ACH is available for eligible transfers), fallible, and idempotent by convention on most rails.
The pattern:
sequenceDiagram
participant C as Client
participant WS as Wallet Service
participant DB as Ledger DB
participant RAIL as External Rail (ACH/Card)
participant RECON as Reconciliation Service
C->>WS: POST /topups {amount, source, idempotency_key}
WS->>DB: create top-up record (status=PENDING)
WS-->>C: 202 Accepted {topup_id}
WS->>RAIL: initiate debit (idempotent, with external_key)
RAIL-->>WS: initiated
note over RAIL: hours to days later...
RAIL->>RECON: settlement file / webhook
RECON->>DB: mark top-up COMPLETED, insert credit ledger row
RECON->>WS: trigger balance update notification
The wallet does not credit the user's balance until the external rail confirms settlement. Optimistic crediting (credit immediately, reverse on failure) is a product decision with fraud risk; most wallets credit on authorization for UX but hold the funds until settlement for risk. Every call to the external rail carries an idempotency key (typically a UUID derived from the internal topup_id) so a timed-out retry deduplicates cleanly on the rail side. The reconciliation job compares the wallet's ledger against the bank's settlement file daily — any discrepancy is flagged for investigation.
Architecture
flowchart TD
C[Client app] --> GW[API Gateway<br/>auth + rate limit]
GW --> WS[Wallet Service]
WS --> IDEM[(Idempotency Store<br/>Postgres unique key)]
WS --> LDB[(Ledger DB<br/>Postgres, sharded by account_id)]
WS --> BAL[(Balance Cache<br/>Redis, account_id → balance+version)]
WS --> KAFKA[Kafka<br/>transfer events]
KAFKA --> NOTIF[Notification Service<br/>push / email]
KAFKA --> AUDIT[Audit Log Service<br/>append-only store]
KAFKA --> FRAUD[Fraud Service<br/>async scoring]
WS --> EXTQ[External Rail Queue<br/>Kafka / SQS]
EXTQ --> RAILSVC[Rail Service<br/>ACH / card processor]
RAILSVC --> BANK[Bank / Card Network]
BANK -.settlement.-> RECON[Reconciliation Service]
RECON --> LDB
style WS fill:#ff6b1a,color:#0a0a0f
style LDB fill:#0e7490,color:#fff
style BAL fill:#15803d,color:#fff
style KAFKA fill:#a855f7,color:#fff
style RECON fill:#ffaa00,color:#0a0a0f
Storage choices
| Data | Store | Rationale |
|---|---|---|
| Ledger rows | Postgres (range-partitioned by date) | ACID, foreign keys, SQL aggregates; partitioning keeps hot partitions small |
| Account metadata | Postgres | Consistent reads alongside ledger |
| Idempotency records | Postgres (same DB, unique constraint) | Shares the transfer transaction; avoids distributed 2PC |
| Balance cache | Redis (account_id → {balance, version}) | Sub-ms reads; invalidated on any committed transfer touching that account |
| Transfer events | Kafka | Fan-out to notifications, fraud, audit without coupling services |
| Audit log | Append-only store (S3 + Athena, or immutable Postgres partition) | Regulatory compliance; never deleted |
| External rail state | Postgres (transfers + status) | Reconciliation joins against ledger |
Preventing overdraft — the complete picture
There are three layers, each catching a different failure mode:
flowchart TD
REQ[Transfer Request] --> APP[1. Application layer<br/>read balance cache, reject fast]
APP -->|"balance looks sufficient"| DB[2. Database layer<br/>SELECT FOR UPDATE, authoritative check]
DB -->|"balance confirmed"| CON[3. Constraint layer<br/>CHECK balance >= 0]
CON -->|"all clear"| COMMIT[Commit ledger rows]
APP -->|"obviously insufficient"| REJ1[Reject 400]
DB -->|"insufficient after lock"| REJ2[Reject 400 — rollback]
CON -->|"bug caught"| REJ3[Reject — rollback]
style APP fill:#ffaa00,color:#0a0a0f
style DB fill:#ff6b1a,color:#0a0a0f
style CON fill:#0e7490,color:#fff
style COMMIT fill:#15803d,color:#fff
style REJ1 fill:#ff2e88,color:#fff
style REJ2 fill:#ff2e88,color:#fff
style REJ3 fill:#ff2e88,color:#fff
The application layer reads the balance from cache and rejects obviously insufficient requests before touching the database. The database layer is the authoritative check — inside the transaction, SELECT balance FOR UPDATE, compare with the amount, raise an error if insufficient. The cache can be stale, so this is the only layer that counts. The constraint layer is the backstop: a CHECK (balance >= 0) constraint on the accounts table, or a trigger that rejects any ledger insert that would make the running total negative. This catches bugs in the application layer. All three must agree; the DB constraint is the final defense.
Balance derivation vs. materialized balance
The fully-derived approach queries the ledger on every balance read:
SELECT SUM(amount) FROM ledger WHERE account_id = :id;
This is the cleanest design — single source of truth, no cache invalidation bugs. The downside is speed: for accounts with millions of ledger rows, this query grows expensive. You can bound it with a starting checkpoint, but it adds complexity.
The materialized approach keeps a balance column in the accounts table, updated atomically inside each transfer transaction. On any discrepancy, the ledger wins — reconciliation re-derives and corrects. Most production wallets use the materialized approach for read performance and reserve the fully-derived query for reconciliation, dispute resolution, and periodic audits. The ledger is the ground truth; the column is just a fast approximation of it.
Sharding the ledger
At 290 TPS and 18 B rows over 5 years, a single Postgres instance works but a shard boundary at ~500 GB is prudent.
| Strategy | Pros | Cons |
|---|---|---|
Shard by account_id | All rows for one account on one shard; balance queries are local | Hot accounts (merchants) create hot shards |
Shard by transfer_id | Even distribution | A single transfer's debit and credit may be on different shards; cross-shard reads for account history |
| Range partition by date (single shard) | Simple; old partitions can be archived to cold storage | All writes go to the current partition; partition becomes hot |
| Hybrid: shard by account_id bucket, range partition within shard | Good balance; cross-shard balance aggregation only for reconciliation | More operational complexity |
The recommendation for most wallets at this scale: shard by hashed account_id (consistent hashing to ~16 shards), with range partitioning by date within each shard. Each shard handles ~18 TPS peak writes — trivial.
See consistent hashing for the sharding mechanics.
Failure modes and mitigations
| Failure | Symptom | Mitigation |
|---|---|---|
| Double-spend on retry | Transfer debits sender twice | Idempotency key + unique constraint on (transfer_id, entry_type) in ledger |
| Lost update under concurrency | Two transfers each read old balance; both succeed; sender's balance goes negative | SELECT ... FOR UPDATE inside transaction |
| Partial transfer (crash between debit and credit) | Money disappears | ACID transaction: both rows or neither |
| Balance drift (cache out of sync with ledger) | Stale balance shown; overdraft possible | Invalidate cache on every committed write; reconciliation re-derives nightly |
| External rail timeout (top-up neither confirmed nor rejected) | Wallet neither credits nor debits | Idempotent retry to rail; reconciliation catches gaps; pending state with TTL |
| Hot account contention | p99 transfer latency spikes on popular merchant accounts | Serialized command queue per account (Kafka partition by account_id) |
| Fraudulent burst transfers | Account drained | Fraud service consumes Kafka events; async velocity rules; hard limit in application layer |
Idempotency — the full contract
An idempotency key has a contract worth spelling out clearly. One key equals one intent — clients must generate a fresh key for a genuinely new transfer. Keys expire after a window (typically 7–30 days); after expiry, the key is recycled and a new transfer with the same key starts fresh. The server stores the full response (status code, transfer_id, timestamp) keyed by the idempotency key, so retries receive the exact same response. If two different clients send the same key with different parameters (amount, recipient), the server rejects the second with 409 Conflict — the key must be globally unique per client (UUID v4 is sufficient). And critically, the idempotency record is written inside the transfer transaction: if the transaction rolls back, the record rolls back too, so a retry after a legitimate failure is correctly re-attempted rather than silently short-circuited.
Reconciliation
Reconciliation is the financial system's immune system. It runs periodically (daily is standard, hourly for high-value systems) and answers: "Does the sum of all ledger rows for all accounts equal the sum of all external settlements?"
For each account:
ledger_balance = SUM(ledger.amount) WHERE account_id = X
expected_balance = last_reconciled_balance
+ SUM(confirmed_external_credits)
- SUM(confirmed_external_debits)
if ledger_balance != expected_balance:
flag for manual review
Discrepancies reveal bugs (double-credit, missing debit), external rail errors (bank credited wrong amount), or fraud. Reconciliation is the only safe way to detect silent data corruption.
Intra-wallet vs. payment processing
This article covers a stored-value wallet: users hold a balance that lives entirely inside your system. Intra-wallet transfers are cheap, instant, and ACID because you control both sides.
The design-payment-system article covers card processing: the user has no balance in your system, money moves via external card rails (Visa/Mastercard), and the hard problems are authorization, clearing, settlement, and chargeback handling. The mechanics differ significantly — card processing requires online authorization against the issuing bank in real time; wallet transfers are a local DB transaction.
Many real products combine both: a wallet balance funded by card top-ups, with intra-wallet transfers for peer-to-peer and card rails for merchant payments.
Things to discuss in an interview
- Why double-entry? It is self-auditing: the sum of all entries for any closed set of accounts is always zero. Intra-wallet transfers trivially sum to zero. External cash flows (top-ups, withdrawals) must be recorded against system-side liability or float accounts to preserve the invariant across the full chart of accounts. Any single-entry mutation can silently drift; double-entry makes errors visible at reconciliation.
- Idempotency placement: why the idempotency record must live in the same transaction as the ledger rows, not in a separate Redis write.
- Pessimistic vs. optimistic locking: when does each break down, and why is pessimistic safer for financial systems?
- Derived vs. materialized balance: performance vs. correctness trade-off; how reconciliation catches drift.
- External rail async flow: pending → completed → settled, and what happens on timeout.
- Sharding money: cross-shard transfers require distributed transactions or a saga pattern — why you prefer to keep related accounts on the same shard.
- Currency handling: always use
NUMERIC(18,2)(orBIGINTin minor units such as cents/paise if you prefer integer storage) — neverFLOAT(floating-point arithmetic is not safe for money). This schema usesNUMERIC(18,2)with signed amounts; if you switch to integer minor units, change the column toBIGINTand update sample values accordingly.
Things you should now be able to answer
- Why does a digital wallet use a ledger table instead of a balance column?
- What is the double-entry accounting invariant, and how does it enable reconciliation?
- How does
SELECT ... FOR UPDATEprevent overdraft under concurrent transfers? - Why must the idempotency record be written in the same transaction as the ledger rows?
- What happens when an ACH top-up times out, and how does the system recover?
- What is the difference between an intra-wallet transfer and a card payment, architecturally?
- How would you handle a merchant account receiving 5000 transfers per second?
Further reading
- "Ledger: Double-Entry Accounting at Scale" — engineering blogs from Stripe, Square, and Coinbase have publicly discussed variants of this design
- "An Introduction to Double-Entry Bookkeeping" — any accounting textbook; the financial concept predates computers by 500 years
- Designing Data-Intensive Applications, Chapter 7 (Transactions) — Kleppmann; essential background on isolation levels, lost updates, and write skew
- Payment System Design — card authorization, clearing, and settlement
- Consistent Hashing — sharding mechanics for the ledger database
Frequently asked questions
▸Why does a digital wallet use a double-entry ledger instead of a mutable balance column?
A mutable balance column has no history, cannot be audited, and silently drifts under bugs or direct DB repairs. A double-entry ledger records every transfer as two immutable rows — one debit, one credit — inside a single ACID transaction, so the balance is always re-derivable from history and every cent is permanently traceable. The sum of all ledger rows for a closed set of accounts is always zero, which makes discrepancies visible at reconciliation.
▸How does SELECT FOR UPDATE prevent overdraft under concurrent transfers?
Without a lock, two concurrent transfers can each read the same stale balance, both pass the sufficiency check, and both commit — silently creating money. SELECT FOR UPDATE acquires a row-level exclusive lock on the sender's account row before the balance check; any competing transaction blocks until the first commits and then re-reads the updated balance, so the check and the ledger write are one atomic unit and an overdraft is impossible.
▸Why must an idempotency record be written inside the same transaction as the ledger rows, not in a separate Redis write?
If the idempotency record is written separately — say, to Redis after the DB transaction commits — a crash between commit and the Redis write leaves no record, and the next retry executes a second transfer. Storing the idempotency record inside the same Postgres transaction means it rolls back with the transfer on any failure, so a retry after a legitimate failure is correctly re-executed rather than silently short-circuited as a duplicate.
▸How long does ACH settlement take, and how does the wallet handle the gap?
Standard ACH settles in 1 to 2 business days; same-day ACH is available for eligible transfers. The wallet creates a top-up record in PENDING state and returns 202 Accepted immediately. It credits the user's balance only after the reconciliation service receives settlement confirmation from the bank, converting the record to COMPLETED and inserting the credit ledger row. Any timeout triggers an idempotent retry to the rail; the reconciliation job catches gaps daily.
▸When should you use a serialized command queue instead of pessimistic locking for a wallet account?
For ordinary accounts, SELECT FOR UPDATE is the correct default — the lock is held for milliseconds and throughput stays high. For truly hot accounts such as a platform escrow receiving thousands of payments per second, lock contention causes tail latency spikes under both pessimistic and optimistic strategies. Routing all commands for that account through a single Kafka partition keyed by account_id and processing them with a single consumer eliminates lock contention entirely, though it caps throughput to single-thread speed.
You may also like
Design an LLM Observability Platform
Build the distributed tracing backbone for non-deterministic, multi-step LLM applications — capturing every prompt, completion, token count, and dollar cost across chains, retrievals, and tool calls so you can debug a failed agent run and account for every cent.
Design an LLM Gateway (AI Gateway & Model Router)
A single proxy control plane in front of OpenAI, Anthropic, Google, and open models — routing ~65 trillion tokens a month with automatic failover, semantic caching, per-team budget enforcement, and streaming SSE passthrough, all under 50 ms of added latency.
Design an LLM Fine-Tuning Platform
Turn a base model and a dataset into a deployed fine-tuned adapter at scale — the end-to-end platform covering dataset ingestion, LoRA/QLoRA/DPO training, fault-tolerant distributed GPU scheduling, eval gating, and multi-LoRA serving for hundreds of concurrent fine-tunes.