~/articles/design-payment-system
◆◆◆Advancedasked at Stripeasked at Amazonasked at Uberasked at PayPal

Design a Payment System (Stripe-style)

Move money correctly. Double-entry ledgers, idempotency keys, the authorize/capture/settle lifecycle, reconciliation, and why money never gets eventual consistency.

20 min read2026-05-22Ironclad Academy
// DEPTH
the full breakdown — requirements, capacity, evolution, trade-offs

The problem

Stripe processed $1 trillion in payment volume in 2023. Behind every one of those transactions is a deceptively simple question: did this charge actually go through? The answer matters enormously — "yes" twice means a customer gets billed twice; "no" when the bank actually said yes means lost revenue and a confused user. Getting that single bit right, at scale, across a network that drops requests and times out without explanation, is the core engineering problem.

A payment system is a payment service provider (PSP) — the layer between a merchant's checkout page and the global card networks (Visa, Mastercard) and banks. When you swipe your card at a grocery store or pay on a shopping site, Stripe, Adyen, Braintree, or PayPal is doing three distinct jobs in sequence: authorization (asking the cardholder's bank to reserve funds), capture (confirming the merchant intends to collect), and settlement (the actual inter-bank money transfer, which happens in overnight batch files, not in real time). Your code touches all three stages, but they run on timelines that span seconds to two business days.

Payment systems are the most unforgiving distributed-systems problem in the catalog. Every other design problem tolerates some degree of eventual consistency or occasional data loss. Payments do not. A lost message in a chat app is a bad user experience. A lost debit is fraud. A double credit is money out of your pocket.

The engineering tension is twofold. First, exactly-once execution: the card network is unreachable or times out, so you never know if a call landed — yet retrying naively double-charges the customer. Second, immutable correctness: every cent must be traceable forever, which rules out mutable balance columns and demands an append-only double-entry ledger. Solve those two problems and the rest of the design follows naturally.

Functional requirements

  • Accept card payments on behalf of a merchant: authorization, capture, void.
  • Settle funds to the merchant's bank account (payout).
  • Process refunds (full or partial) and handle disputes/chargebacks.
  • Issue webhooks to merchants on every status change.
  • Support multiple currencies.
  • Provide a ledger/reporting API for merchant reconciliation.

Non-functional requirements

  • Exactly-once money movement: retries, timeouts, and network blips must not result in double charges.
  • Strong consistency: balances are always correct, never "eventually" correct.
  • High availability: 99.99% uptime target; payment downtime costs merchants directly.
  • PCI-DSS compliance: raw card data (PAN, CVV) must never be stored or logged on application servers.
  • Auditability: every money movement must be traceable back to the originating event, forever.
  • p99 charge latency < 3 seconds (dominated by card network round-trip, typically 200–800ms).

Capacity estimation

DimensionEstimateHow we got there
Charge throughput1,000 charges/sec peakGiven; roughly the scale of a large PSP
Charges per day86.4M charges/day1,000/s × 86,400 s/day
Ledger writes5,000 writes/sec peakEach charge creates ~5 entries (see double-entry section)
Ledger entry size~200 bytesAmounts, currency, account IDs, metadata
Ledger write bandwidth1 MB/sec5,000 writes/s × 200 B
Ledger storage (10 years)~315 TB1 MB/s × 86,400 s/day × 365 × 10 years
Idempotency key lookups1,000/secOne lookup per incoming charge
Idempotency key retention24 h minimum (30 days for Stripe API v2)Stripe API v1 minimum; v2 retains for 30 days
Idempotency key count at 30 days~2.6B keys1,000/s × 86,400 × 30
Idempotency key size~300 bytes/keyKey string + status + response body
Idempotency store size (30-day)~780 GB2.6B × 300 B (use 24 h retention to reduce; manageable either way)
Webhook deliveries~1,000/sec steady stateOne webhook per charge event
Webhook HTTP calls (with retries)~1,500/secRetry fan-out: assume 1.5× average

Takeaway: 315 TB of ledger storage over 10 years is real but manageable — partition Postgres by month, archive cold partitions to object storage, and keep only the rolling 12–24 months hot (~30–60 TB on a sharded cluster). The bottleneck is correctness, not throughput.

The players and the money flow

Before designing anything, understand the ecosystem. A card payment involves six parties:

flowchart LR
    CH[Cardholder] -->|"presents card"| MER[Merchant]
    MER -->|"authorization request"| PSP[PSP / Gateway\nyou are building this]
    PSP -->|"routes via network"| CN[Card Network\nVisa / Mastercard]
    CN -->|"authorization request"| ISS[Issuing Bank\ncardholder's bank]
    ISS -->|"approved / declined"| CN
    CN -->|"authorization response"| PSP
    PSP -->|"response"| MER
    ISS -.->|"settlement T+1/T+2"| ACQ[Acquiring Bank\nmerchant's bank]
    ACQ -.->|"payout"| MER
    style PSP fill:#ff6b1a,color:#fff
    style CN fill:#0e7490,color:#fff
    style ISS fill:#15803d,color:#fff
    style ACQ fill:#ffaa00,color:#0a0a0f

The card network (Visa, Mastercard) is not a real-time money transfer system. Authorization — the hold on the cardholder's account — and settlement — the actual movement of funds between banks — are separate processes that run on separate timelines. Authorization happens in seconds; settlement happens in batches at T+1 or T+2 business days. This distinction drives much of the design below.

The payment lifecycle

stateDiagram-v2
    [*] --> Initiated: POST /charges
    Initiated --> Authorized: network approved
    Initiated --> Declined: network declined
    Initiated --> Failed: network timeout / error
    Authorized --> Captured: capture confirmed
    Authorized --> Voided: merchant voids hold
    Captured --> Clearing: submitted to network batch
    Clearing --> Settled: funds moved T+1/T+2
    Settled --> Refunded: refund requested
    Refunded --> [*]
    Settled --> Disputed: cardholder disputes
    Disputed --> ChargebackLost: issuer rules for cardholder
    Disputed --> ChargebackWon: issuer rules for merchant
    ChargebackLost --> [*]
    ChargebackWon --> [*]
    Failed --> [*]
    Declined --> [*]
    Voided --> [*]

Authorization reserves funds on the cardholder's account. No money moves. The issuing bank places a hold.

Capture tells the network "I'm taking those funds." In most card-present flows, authorize and capture happen together (auth+capture). In card-not-present flows (e-commerce), merchants often authorize at checkout and capture only when the item ships.

Clearing and settlement is batch processing. The acquirer submits a batch file to the card network at end of day. The network settles inter-bank (issuer pays acquirer). The acquirer credits the merchant account, minus interchange fees.

Refunds are separate transactions flowing the other direction. They are not cancellations of the original charge — they are new credits. This distinction matters for your ledger.

Building up to the design

V1: Synchronous charge, pray it works

def charge(card_number, amount):
    result = card_network.authorize(card_number, amount)
    db.insert("charges", {"amount": amount, "status": result.status})
    return result

This works on the happy path. The problem emerges the moment card_network.authorize() times out after two seconds. Did the network charge the card? You don't know. You have no record in the DB because the call never returned. If the user retries, do you charge again? That is the core failure mode of every naive payment implementation.

V2: Write intent before calling the network

def charge(card_number, amount):
    charge_id = db.insert("charges", {"amount": amount, "status": "pending"})
    result = card_network.authorize(card_number, amount, ref=charge_id)
    db.update("charges", charge_id, {"status": result.status})
    return result

Now you can reconcile — the charge row exists even if the network call crashes. But if the user retries POST /charges, you insert a second pending row and potentially send a second authorization. Double charge.

V3: Idempotency keys

The client supplies a unique key with every request:

POST /v1/charges
Idempotency-Key: merchant-order-42-attempt-1

On receipt:

  1. Look up the key in the idempotency store.
  2. If found and completed: return the stored response immediately.
  3. If found and in-flight: return 409 or wait.
  4. If not found: insert the key (with status=pending), proceed.

The idempotency key must be committed to durable storage before the network call. If you commit after, a crash between the network call and the commit means the charge went through but you have no record.

Clients can now retry indefinitely without fear of double charges. But your ledger is still just a charges table with an amount column. When you refund, you update the row. When you dispute, you update the row. When an audit asks "show me every money movement," you have partial history and mutable state. Accounting teams hate this.

V4: Double-entry ledger

Replace mutable state with an append-only ledger. This is the structural heart of a correct payment system — covered in depth in the next section.

V5: Saga for multi-step flows + reconciliation

Network calls fail. Settlement is async. Add a saga with compensation steps and a daily reconciliation job to catch anything that slipped through.

flowchart LR
    V1["V1: sync charge<br/>happy path only"] --> V2["V2: write intent first<br/>reconcilable"]
    V2 --> V3["V3: + idempotency keys<br/>safe retries"]
    V3 --> V4["V4: + double-entry ledger<br/>immutable audit trail"]
    V4 --> V5["V5: + saga + reconciliation<br/>production grade"]
    style V1 fill:#0e7490,color:#fff
    style V3 fill:#15803d,color:#fff
    style V4 fill:#ff6b1a,color:#fff
    style V5 fill:#a855f7,color:#fff

The double-entry ledger

Every accounting system since the 1400s uses double-entry bookkeeping. Every correct payment system does too, and for the same reason: it makes errors visible.

The rule is simple: every transaction creates at least two entries — a debit on one account and a credit on another — such that the sum of all entries for any transaction always equals zero. That zero-sum constraint means you can verify the ledger's integrity with a single query. A mismatch means a bug, not a race condition.

CREATE TABLE ledger_accounts (
    id          BIGSERIAL PRIMARY KEY,
    type        TEXT NOT NULL,   -- 'asset', 'liability', 'revenue', 'expense'
    name        TEXT NOT NULL,   -- e.g. 'merchant:acct_123', 'stripe_cash', 'interchange_payable'
    currency    CHAR(3) NOT NULL
);

CREATE TABLE ledger_entries (
    id              BIGSERIAL PRIMARY KEY,
    account_id      BIGINT NOT NULL REFERENCES ledger_accounts(id),
    amount          BIGINT NOT NULL,   -- in minor units (cents), signed: positive = debit, negative = credit
    currency        CHAR(3) NOT NULL,
    transaction_id  UUID NOT NULL,     -- groups the balanced pair(s)
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    description     TEXT,
    idempotency_key TEXT
);

-- Balance for any account is DERIVED, never stored:
-- SELECT SUM(amount) FROM ledger_entries WHERE account_id = ?;

A few properties are non-negotiable. Ledger entries are append-only — you never UPDATE or DELETE one, ever. Corrections are new entries that reverse the original. The balance is derived (SELECT SUM(amount) FROM ledger_entries WHERE account_id = ?), never stored as a column that can drift out of sync. Store integers$10.50 = 1050 — because IEEE 754 floating-point cannot represent 0.1 exactly and those errors accumulate. And for every transaction_id, SUM(amount) = 0. A background audit job can verify this invariant continuously; a violation means a code bug, and it will be immediately visible.

What a charge looks like in the ledger:

transaction_idaccountamount (cents)description
txn_abcfunds_in_transit+1000Authorization hold
txn_abccardholder_receivable-1000Authorization hold
txn_deffunds_in_transit-1000Settlement debit
txn_defmerchant_payable+971Net of interchange
txn_definterchange_payable+29Interchange fee

The entries balance. The merchant's balance (merchant_payable) grows by 971 cents. The fee is tracked separately. Every number is traceable back to the originating event.

Idempotency in depth

The idempotency key store is separate from the ledger — it is operational state, not accounting state:

CREATE TABLE idempotency_keys (
    key             TEXT PRIMARY KEY,
    request_hash    TEXT NOT NULL,       -- hash of (method, path, body) — detect conflicting requests
    status          TEXT NOT NULL,       -- 'in_flight', 'completed', 'failed'
    response_body   JSONB,               -- stored response to replay
    charge_id       UUID,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    locked_at       TIMESTAMPTZ
);

The algorithm runs inside a single serializable DB transaction:

BEGIN TRANSACTION SERIALIZABLE;

key_row = SELECT ... FROM idempotency_keys WHERE key = ? FOR UPDATE;

IF key_row IS NULL:
    INSERT INTO idempotency_keys (key, status='in_flight', request_hash=hash(request))
    COMMIT;
    -- proceed with charge
ELIF key_row.status = 'completed':
    COMMIT;
    RETURN key_row.response_body   -- replayed, no charge
ELIF key_row.status = 'in_flight':
    COMMIT;
    RETURN 409 Conflict            -- another request is processing
ELIF key_row.request_hash != hash(request):
    RETURN 422 Unprocessable       -- same key, different body — error

After the charge completes (or definitively fails), update status to completed or failed and write response_body.

flowchart TD
    REQ["Incoming request\n+ Idempotency-Key"] --> LOCK["SELECT ... FOR UPDATE\non idempotency_keys"]
    LOCK --> EXISTS{Key exists?}
    EXISTS -->|"No"| INSERT["INSERT status=in_flight\nCOMMIT\nproceed with charge"]
    EXISTS -->|"completed"| REPLAY["Return stored response\nno charge"]
    EXISTS -->|"in_flight"| CONFLICT["Return 409 Conflict\nanother request in progress"]
    EXISTS -->|"wrong request_hash"| UNPROC["Return 422\nsame key different body"]
    INSERT --> NET["Call card network"]
    NET --> UPDATE["UPDATE status=completed\nwrite response_body"]
    style INSERT fill:#ff6b1a,color:#fff
    style REPLAY fill:#15803d,color:#fff
    style CONFLICT fill:#ffaa00,color:#0a0a0f
    style UNPROC fill:#ff2e88,color:#fff

There is one tricky window: between INSERT (status='in_flight') and UPDATE status='completed', a process crash leaves the key in-flight forever. A background sweeper finds keys older than N minutes still marked in_flight, marks them failed, and triggers reconciliation to determine the actual network outcome.

The saga pattern for multi-step flows

A refund touches multiple systems: your ledger, the card network, and possibly a payout service. If step 2 fails after step 1 succeeds, you need to compensate.

sequenceDiagram
    participant API
    participant Ledger
    participant Network
    participant Webhook
    API->>Ledger: 1. Debit merchant_payable (reserve refund funds)
    Ledger-->>API: ok
    API->>Network: 2. Submit refund to card network
    Network-->>API: timeout
    API->>API: 3. Is this safe to retry?
    Note over API: Idempotency key on network call → yes
    API->>Network: 3b. Retry refund submission
    Network-->>API: ok (duplicate detected, same result)
    API->>Ledger: 4. Record refund settlement entry
    Ledger-->>API: ok
    API->>Webhook: 5. Notify merchant: refund.succeeded

If step 2 definitively fails — an explicit decline, not a timeout — the compensation is to reverse step 1 by crediting merchant_payable back. That compensation is a new ledger entry, not a delete.

Why not two-phase commit? The card network has no XA protocol. You cannot include Visa in a distributed transaction. The only correct approach is idempotent operations + saga + reconciliation.

Reconciliation

Reconciliation is the safety net that catches everything idempotency misses.

The card network sends a settlement file at the end of each business day (sometimes multiple times per day). This file lists every transaction the network processed: authorizations, captures, refunds, chargebacks, interchange fees. Your job is to match that file against your ledger.

FOR each entry in settlement_file:
    find matching ledger entry by network_reference_id
    IF not found:
         MISSING: charge happened on network but not in our ledger
          Alert; human review; create compensating entry
    IF found but amounts differ:
         MISMATCH: investigate rounding, FX conversion, fee schedule
    IF found and amounts match:
         mark as reconciled

FOR each ledger entry in state='pending_settlement' older than 3 days:
    not in settlement file
     STUCK: chase the acquirer; may need manual resolution

Missing-in-ledger is the most dangerous outcome: the customer was charged, your ledger has no record. That needs immediate human escalation, not an automated retry.

Full architecture

flowchart TD
    M[Merchant] -->|"POST /charges\nIdempotency-Key"| GW[API Gateway\nAuth + TLS]
    GW --> API[Payment API Service]

    API --> IK[(Idempotency Key Store\nPostgres / Redis)]
    API --> VAULT[Card Vault\nTokenize PAN]
    API --> SM[Payment State Machine]

    SM -->|"auth request\n+ PSP reference ID"| NET[Card Network\nVisa/MC]
    NET -->|"auth response"| SM

    SM -->|"ACID transaction"| LEDGER[(Double-Entry Ledger\nPostgres — partitioned)]
    SM --> SAGA[Saga Orchestrator]
    SAGA -->|"capture / refund / void"| NET
    SAGA --> LEDGER

    LEDGER --> RECON[Reconciliation Service\nT+1 batch]
    NET -.->|"settlement file"| RECON
    RECON -.->|"alerts on mismatch"| OPS[On-call]

    SM --> WHDISPATCH[Webhook Dispatcher]
    WHDISPATCH -->|"signed POST\nexponential backoff"| M

    LEDGER --> RPT[Reporting / Balance API]
    RPT --> M

    style API fill:#ff6b1a,color:#fff
    style LEDGER fill:#15803d,color:#fff
    style IK fill:#0e7490,color:#fff
    style VAULT fill:#a855f7,color:#fff
    style SM fill:#ffaa00,color:#0a0a0f
    style RECON fill:#ff2e88,color:#fff

Card vault and PCI-DSS

Raw PANs (Primary Account Numbers) and CVV codes must never touch your application servers if you want to minimize PCI-DSS scope. The standard pattern:

  1. The browser or mobile SDK calls the vault service directly (or uses a hosted fields iframe).
  2. The vault stores the PAN encrypted under a key in a hardware security module (HSM).
  3. The vault returns a payment method token (e.g., pm_xxx).
  4. All further API calls use the token. Your application servers never see the PAN.
  5. When sending to the card network, the vault retrieves the PAN (in a tightly audited, isolated environment) and passes it through a separate network path.

Tokenization also enables network tokenization: card schemes like Visa Token Service (VTS) replace the PAN with a network token that is specific to a merchant. These tokens survive card replacements (when a physical card is reissued, the token stays valid) — reducing involuntary churn from expired cards.

Currency handling

RuleDetail
Store in minor units$10.501050 (integer cents). ¥10001000 (yen has no minor unit). Use ISO 4217 to look up exponent.
Never use floatIEEE 754 cannot represent 0.1 exactly. 0.1 + 0.2 = 0.30000000000000004 in most languages.
RoundingApply rounding only at presentation layer. Intermediate calculations stay in integer minor units.
FX conversionApply the exchange rate at the moment of authorization, record both the original currency amount and the settlement currency amount in the ledger entry. Store the FX rate used.
Interchange feesCard networks apply interchange rates that vary by card type, merchant category, and geography. Store these as separate ledger entries, not as a deduction from the main amount.

Webhooks

Merchants need asynchronous notification when a charge succeeds, fails, or is refunded. Synchronous responses only tell them about the immediate API call; settlement and disputes happen minutes to days later.

Webhook delivery contract:

POST https://merchant.example.com/hooks/stripe
Content-Type: application/json
Stripe-Signature: t=1717000000,v1=abc123...   ← HMAC-SHA256 of timestamp + body

{
  "id": "evt_abc",
  "type": "charge.succeeded",
  "data": { "charge_id": "ch_xyz", "amount": 1000, "currency": "usd" }
}

The dispatcher retries with exponential backoff (1s, 2s, 4s, ... up to 24 hours) until it receives a 2xx or the event expires. The merchant verifies the HMAC signature to reject forged webhooks. Events are delivered at-least-once, not exactly-once — merchants deduplicate on evt_id and must handle out-of-order delivery. The dispatcher is a separate service that reads from a durable queue (backed by Postgres or Kafka) and tracks delivery state per event per endpoint.

Failure modes

FailureSymptomResolution
Timeout between PSP and card networkUnknown if charge was appliedIdempotency key + reconciliation determines outcome; do NOT retry without idempotency
PSP crashes after network auth, before writing ledgerLedger missing entry; customer chargedReconciliation detects missing entry; compensating ledger entry + alert
Double-submit from merchantTwo identical API callsIdempotency key deduplicates; second call returns stored first response
Webhook delivery failsMerchant misses eventRetry queue; exponential backoff; event log API for merchant to poll
Settlement file missing from networkReconciliation cannot close dayAlert; retry file fetch; manual chase with acquirer
Partial saga failure (refund step 2 fails)Merchant payable debited, refund not submittedSaga compensation reverses ledger entry; retry or escalate
Ledger invariant violatedSUM(amount) != 0 for a transaction_idBackground audit job catches; alerts on-call; indicates a bug, not a race
FX rate stale on multi-currency chargeWrong conversion appliedRate locked at authorization time; stored in ledger entry; immutable

Storage choices

DataStoreJustification
Ledger entriesPostgres (partitioned by month)ACID, serializable isolation; balance queries need strong consistency
Payment statePostgres (with optimistic locking)State machine transitions must be atomic and consistent
Idempotency keysPostgres (or Redis with AOF persistence)Durable; sub-ms lookup; short-lived data (24h minimum; 30-day retention for longer replay windows)
Card vault (encrypted PAN)Dedicated vault service + HSMPCI-DSS isolation; hardware-protected keys
Webhook delivery statePostgres-backed queueDurable; needs retry tracking and backoff
Settlement filesObject storage (S3/GCS)Large binary files; immutable; long retention
Reconciliation resultsPostgresQueried by ops for mismatch investigation
Reporting / analyticsRead replica or OLAP (ClickHouse)Offload aggregate queries from transactional DB

API design

POST /v1/charges HTTP/1.1
Idempotency-Key: order-42-attempt-1
Authorization: Bearer sk_live_...
Content-Type: application/json

{
  "amount": 2000,
  "currency": "usd",
  "payment_method": "pm_card_visa",
  "capture_method": "automatic",
  "description": "Order #42"
}

→ 200 OK
{
  "id": "ch_abc123",
  "status": "succeeded",
  "amount": 2000,
  "currency": "usd",
  "captured": true,
  "created": 1717000000
}
POST /v1/charges/ch_abc123/refund HTTP/1.1
Idempotency-Key: refund-order-42-v1

{
  "amount": 500    // partial refund
}
GET /v1/balance HTTP/1.1
→ { "available": [{"amount": 9710, "currency": "usd"}],
    "pending":   [{"amount": 2000, "currency": "usd"}] }

available = funds cleared and ready for payout. pending = authorized/captured but not yet settled. Both are derived from the ledger. Never stored as a column.

Things to discuss in an interview

  • Idempotency key storage: why Postgres and not just Redis? Redis can lose data on crash even with AOF if the window is wrong. For money, durable idempotency is non-negotiable.
  • Why not 2PC with the card network? External networks do not support distributed transactions. Sagas with compensation are the only viable pattern.
  • The reconciliation loop: interviewers love this — it reveals you understand that even with perfect idempotency, external systems can diverge. Reconciliation is the ultimate safety net.
  • Chargeback flow: a cardholder disputes a charge with their bank. The issuer reverses the funds provisionally, sends a dispute notice to the acquirer, who forwards it to you. The merchant response window is 30 days per phase for Visa and up to 45 days for Mastercard (US) — though your acquirer will impose shorter internal deadlines, often 5–10 days in practice. Cardholders can initiate a chargeback up to 120 days after the transaction date (certain reason codes allow longer). If you lose, the chargeback stands and you owe the funds plus a fee.
  • Why minor units? Demonstrate that you know floats are wrong for money. This is a filter question.
  • PCI scope reduction: hosted fields / vault tokenization means your servers never see the raw PAN. Dramatically reduces what must be PCI-audited.
  • T+1 vs real-time settlement: the Visa/Mastercard rails are batch-oriented by design. Instant payout products (Stripe Instant Payouts) work by the PSP advancing funds from its own treasury against the merchant's pending (not-yet-settled) card balance — the PSP takes on the settlement risk, and later deducts the advance as the underlying funds settle normally through the card network.

Things you should now be able to answer

  • Why is a double-entry ledger better than an accounts table with a balance column?
  • What happens to a payment if your server crashes between the card network returning "approved" and you writing to the database?
  • Why can't you use eventual consistency for account balances?
  • What is the difference between authorization and capture? Between capture and settlement?
  • How do you ensure a refund API is safe to retry?
  • What is PCI-DSS tokenization and why does it matter for system design?
  • When does reconciliation catch things that idempotency misses?

Further reading

  • Martin Fowler — "Accounting Patterns" (martinfowler.com) — the canonical explanation of double-entry in software
  • Stripe Engineering Blog — "Idempotency" and "Designing robust and predictable APIs with idempotency"
  • "Ledger: Stripe's foundation for financial accuracy" — Stripe Sessions talk
  • ISO 4217 — currency codes and minor unit exponents
  • PCI Security Standards Council — PCI-DSS v4.0 summary requirements
  • "Sagas" — Hector Garcia-Molina & Kenneth Salem (1987) — the original saga paper
// FAQ

Frequently asked questions

What is an idempotency key and why does a payment system require it?

An idempotency key is a client-supplied unique string sent with every charge request. The server stores the key and its outcome before returning a response, so that any retry — caused by a network timeout or client crash — looks up the stored result and returns it immediately without re-executing the charge. Without it, a timeout between the PSP and card network leaves the caller unable to know whether the charge landed, and retrying naively double-charges the customer.

What is a double-entry ledger and why is it preferred over storing a balance column?

A double-entry ledger is an append-only table of signed entries where every transaction produces at least two rows that sum to zero, making the account balance a derived value computed as SELECT SUM(amount). A stored balance column can drift out of sync through bugs or partial failures; the ledger cannot, because a background job can verify the zero-sum invariant for every transaction_id at any time and a violation is immediately visible as a code bug.

What is the difference between authorization, capture, and settlement in the card payment lifecycle?

Authorization reserves funds on the cardholder's account at the issuing bank — no money moves. Capture tells the network the merchant intends to collect those reserved funds. Settlement is the actual inter-bank money transfer, which happens in overnight batch files at T+1 or T+2 business days, not in real time. In e-commerce flows, merchants often authorize at checkout and capture only when the item ships.

Why can't a payment system use two-phase commit (2PC) with the card network, and what pattern replaces it?

Card networks like Visa and Mastercard have no XA protocol and cannot participate in a distributed transaction, so 2PC is not feasible. The correct pattern is the saga: each step in a multi-system flow uses an idempotency key for safe retries, and if a step definitively fails, an explicit compensation step reverses the prior ledger entry as a new append — not a delete. Reconciliation against the daily settlement file is the final safety net.

How much ledger storage does a payment system at 1,000 charges per second require over 10 years, and how is that managed?

At 1,000 charges per second, each charge producing roughly 5 ledger entries at 200 bytes each, 10 years of retention accumulates approximately 315 TB. The standard approach is monthly table partitioning in Postgres with automated archival of cold partitions to object storage such as S3 or GCS, keeping only the rolling 12 to 24 months hot — reducing the live Postgres footprint to roughly 30 to 60 TB on a sharded cluster.

// RELATED

You may also like