Change Data Capture (CDC) & the Outbox Pattern
Turn your database write log into a reliable event stream. Log-based CDC, the dual-write problem, and the transactional outbox.
The dual-write problem is a bug that doesn't surface in unit tests. It surfaces in production at 3am. Understanding CDC is a prerequisite for building any system where your database and your event stream need to stay consistent.
The dual-write problem
Picture an order service that, on every new order, must do two things:
INSERT INTO orders (id, user_id, amount, status) VALUES (...)— durable, in Postgres.kafka.produce("order.created", { id, user_id, amount })— notify downstream services.
These are two independent I/O calls. The application performs them sequentially. What can go wrong?
process crash here
↓
DB write succeeds → [process dies] → Kafka publish never happens
→ downstream never notified
→ cache is stale
→ search index is stale
→ audit log is missing the event
Or the reverse:
DB write fails → [code still reaches kafka.produce] → Kafka publish succeeds
→ ghost event for a non-existent order
This is not a hypothetical. The window is small — microseconds between the two calls — but at high throughput and with restarts, deploys, and OOM kills, it will happen. Two-phase commit (2PC) across a relational database and a Kafka broker is theoretically possible but brings significant complexity and performance cost; no mainstream service actually does it this way.
The real solutions eliminate the dual-write entirely.
Solution 1: Log-based CDC
Every production relational database maintains a replication log as part of its durability and replication machinery. Postgres calls it the WAL (write-ahead log); MySQL and MariaDB call it the binary log (binlog). The critical property: a row change appears in the replication log if and only if the transaction committed. Rolled-back transactions leave no log entry.
Log-based CDC works by connecting a connector process to this log — the same way a standby replica does — and translating raw log entries into structured change events.
sequenceDiagram
participant App as Application
participant PG as Postgres primary
participant WAL as WAL replication slot
participant DEB as Debezium connector
participant K as Kafka
App->>PG: BEGIN, INSERT INTO orders, COMMIT
PG->>WAL: write WAL record at LSN 00042A
DEB->>WAL: read new records via streaming replication
WAL-->>DEB: WAL record at LSN 00042A
DEB->>DEB: decode row image — old + new values
DEB->>K: produce event — op=c, key=order_id, after=new row
DEB->>WAL: advance confirmed_flush_lsn to 00042A
Debezium is the dominant open-source connector framework for this. It supports Postgres (logical replication), MySQL (binlog), MongoDB (oplog), SQL Server (CDC tables), and others. Each table maps to a Kafka topic; each committed row change produces one event with op (c=create, u=update, d=delete), before and after row images, and the log sequence number (LSN or binlog position).
How Postgres logical replication works
Postgres has two replication modes. Physical replication copies raw WAL bytes — fast, but the standby must be the same Postgres version and you can't filter by table. Logical replication decodes the WAL using a logical decoding plugin (e.g. pgoutput, the built-in plugin shipped with Postgres 10+, or wal2json, a popular third-party plugin that predates pgoutput). The decoded output is a stream of row-level changes in a structured format. Debezium uses a replication slot — a named cursor in the WAL that Postgres holds open so the slot consumer can read at its own pace without the WAL being garbage-collected underneath it.
-- Debezium creates this on your behalf
SELECT pg_create_logical_replication_slot('debezium', 'pgoutput');
-- The slot keeps WAL around until Debezium confirms it has consumed up to that LSN
-- Monitor lag with:
SELECT slot_name,
pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn) AS lag_bytes
FROM pg_replication_slots;
Critical operational concern: replication slot lag. If the Debezium connector falls behind or goes offline, the replication slot prevents WAL from being recycled. A stuck slot can cause the Postgres primary's disk to fill with WAL segments — a hard outage. Monitor lag_bytes on every slot; alert aggressively; set max_slot_wal_keep_size (Postgres 13+) as a safety valve.
MySQL binlog
MySQL's binlog operates similarly. Set binlog_format = ROW (the default on MySQL 8.0) so the binlog contains full row images rather than just the SQL statements. Debezium connects as a replica using the MySQL replication protocol and reads binlog events.
Ordering guarantees
CDC events for a single primary key arrive in commit order within a single Kafka partition. Debezium's default: partition by the row's primary key. This means all changes to orders.id = 42 land on the same Kafka partition, in the order they were committed. Changes to different primary keys may land on different partitions and have no ordering guarantee relative to each other. For a consumer that needs to join changes across two tables (e.g. orders and order_items), partition by a shared foreign key (e.g. order_id) to ensure co-located ordering.
Solution 2: The transactional outbox
Log-based CDC requires WAL/binlog access — a database-level privilege, and not always available. Some managed database services do not expose a standard MySQL binlog or Postgres logical replication stream; some require non-default configuration to enable it. The transactional outbox pattern achieves the same atomicity guarantee at the application level without needing replication-level access.
The insight is simple: if you can't reach into the log from outside, use the database's own transaction as your atomicity primitive.
-- Application code, in a single DB transaction:
BEGIN;
INSERT INTO orders (id, user_id, amount, status)
VALUES ($1, $2, $3, 'PENDING');
INSERT INTO outbox (id, aggregate_type, aggregate_id, event_type, payload, created_at)
VALUES (gen_random_uuid(), 'order', $1, 'order.created',
'{"user_id": ..., "amount": ...}', now());
COMMIT;
Either both rows land, or neither does — the database guarantees it. The outbox table is just another table in the same database.
A separate relay process then reads the outbox and publishes to the broker:
sequenceDiagram
participant App as Application
participant DB as "DB — orders + outbox tables"
participant Relay as Outbox Relay
participant K as Kafka
App->>DB: BEGIN, INSERT orders, INSERT outbox, COMMIT
Relay->>DB: SELECT * FROM outbox WHERE published_at IS NULL ORDER BY created_at LIMIT 100
DB-->>Relay: row1, row2, ...
Relay->>K: produce topic, key, payload per row
K-->>Relay: ack
Relay->>DB: UPDATE outbox SET published_at = now() WHERE id IN (...)
Polling vs. CDC on the outbox table
The relay has two ways to detect new outbox rows.
Polling queries WHERE published_at IS NULL every N milliseconds. It's simple to implement and easy to reason about. The poll interval is the knob that controls latency: at 100ms you add at most 100ms to every event's end-to-end journey. The trade-off is a steady drip of SELECT traffic on the primary, which is tolerable at moderate throughput (thousands of events/sec) but starts to add up at higher volumes.
CDC on the outbox table runs Debezium on the outbox table itself, not on the business tables. The relay becomes a Debezium connector streaming inserts into that table, which gives you sub-second latency and zero polling overhead. This is sometimes called "CDC-as-relay." The catch: you still need WAL access, which is the thing you were trying to avoid by choosing the outbox. It does make sense when WAL access exists but you want business-level events — not raw row diffs — on Kafka, with atomicity guaranteed by the DB.
Marking as sent and idempotency
The relay uses at-least-once semantics: publish to Kafka, wait for an ack, then mark the outbox row as sent. If the relay crashes after the Kafka ack but before UPDATE outbox SET published_at, the same event gets published again on restart. Consumers must be idempotent — deduplicate on the outbox.id (which should be the Kafka message key), or design handler logic to be naturally idempotent (e.g. UPSERT instead of INSERT).
Side-by-side comparison
The two approaches solve the same atomicity problem from different angles. Here is what that looks like in motion:
flowchart TD
subgraph "Log-based CDC"
A1[App writes to DB] --> B1[WAL / binlog]
B1 --> C1[Connector decodes log]
C1 --> D1[Kafka]
end
subgraph "Transactional Outbox"
A2[App writes to DB] --> B2[Outbox row in same txn]
B2 --> C2[Relay process polls / tails]
C2 --> D2[Kafka]
end
style C1 fill:#ff6b1a,color:#0a0a0f
style B2 fill:#15803d,color:#fff
style C2 fill:#ff6b1a,color:#0a0a0f
| Dimension | Log-based CDC | Transactional Outbox |
|---|---|---|
| Atomicity source | DB transaction log | DB transaction (same txn as business write) |
| WAL / replication access required | Yes | No (unless relay uses CDC on outbox) |
| Event granularity | Raw row diffs (before + after images) | Application-defined event payload |
| Latency (typical) | Sub-second | Polling interval (100ms–1s) or sub-second with CDC relay |
| Schema awareness | Connector must handle schema changes | Application fully controls the payload |
| Ops burden | Replication slot monitoring, connector infra | Outbox table, relay process, index on published_at |
| Best for | High-throughput pipelines, data lake sync, multi-table fan-out | Greenfield microservices, restricted managed DBs, domain-event semantics |
Where CDC is used
Cache invalidation
The painful cache invalidation approach is to have application code call cache.delete(key) after the DB write. That is another dual-write — and it's the one that makes hotfixes messy, because the cache invalidation call is easy to forget in a code path that's rarely tested. With CDC, Debezium emits a row update event and a cache-invalidation consumer deletes or updates the Redis key. The cache consumer is decoupled from the application entirely; the application doesn't even know it exists.
Search index synchronization
Elasticsearch and OpenSearch indexes are derivatives of primary data. A common mistake is a synchronous es.index(doc) call in the write path — it adds latency, and if Elasticsearch is slow or unavailable, it takes down your write path with it. With CDC, a search-indexer consumer reads the change stream and updates the index asynchronously. The index is eventually consistent with the DB; for search UIs, a few seconds of lag is usually acceptable. See also design-distributed-search.
Materialized views and read models (CQRS)
In a CQRS architecture, the write model stores normalized data; read models are pre-joined, pre-aggregated denormalizations. CDC feeds the read-model build process: every committed write to the source tables triggers an incremental update to the read store (Redis, DynamoDB, a reporting database). This removes the O(N) join from the read path entirely.
Microservice event propagation
When the Order Service must notify the Inventory Service and the Notification Service of a new order, the outbox pattern or CDC is the recommended approach in event-driven microservice architectures. The saga pattern for distributed transactions depends on reliable event emission — CDC or the outbox is the typical implementation for the "emit" step. Without it, the saga's first step is broken.
Data lake ingestion
Streaming database change events into object storage (S3, GCS) via Kafka Connect gives you a full, ordered audit trail of every row change — essentially a database transaction log in cloud storage. You can replay it, apply schema evolution on read, and reconstruct the state of any table at any point in time. Airbnb's open-source SpinalTap system (using MySQL binlog to Kafka) is a documented example of CDC feeding real-time data pipelines at scale; many companies use a similar pattern to feed data warehouses.
Bootstrap: snapshot + tail
A CDC connector can't start streaming from "the beginning of time" — the WAL only retains a bounded window (typically hours to days). When you connect a new CDC pipeline to an existing table with millions of rows, you need an initial snapshot.
Debezium's snapshot strategy (configurable, simplified):
stateDiagram-v2
[*] --> AcquireLock: connector first starts
AcquireLock --> ReadCurrentLSN: note the consistent snapshot LSN
ReadCurrentLSN --> ScanTable: release lock; SELECT * FROM table
ScanTable --> EmitEvents: emit one 'read' event per row
EmitEvents --> TailWAL: stream changes from snapshot LSN onward
TailWAL --> [*]: steady-state operation
The snapshot and WAL tail together produce a complete, consistent view: every row that existed at snapshot time, plus every change that committed after. There is no gap. This is the same principle behind how a Postgres standby comes up from a base backup and then replays WAL.
For very large tables (hundreds of millions of rows), the initial snapshot can take hours. Debezium supports "incremental snapshots" — chunking the table scan so it doesn't hold a lock for the full duration, interleaving the scan with live WAL events using a watermarking method (writing low/high watermark signals to the source database's transaction log to demarcate each chunk's window, based on the DBLog paper from Netflix).
Failure modes
Relay lag
The connector or relay falls behind. On Postgres, the replication slot holds WAL in place as the connector lags, so WAL segments accumulate on disk. For Kafka-backed CDC, acks=all and producer retry settings slow things further during broker degradation.
Monitor lag in both records and seconds. Alert at 10 seconds behind; page at 60 seconds. Keep replication slots at most 2–3 per database. If a slot stays stuck beyond a defined safety window, drop it and replay from a fresh snapshot — painful once, but better than a full disk outage.
Duplicate publishes
A network partition between the relay and the broker is the common source: the relay sent the message, the broker acked, then the broker went away before the relay received that ack. On reconnect, the relay resends. Kafka idempotent producers (enabled via enable.idempotence=true) deduplicate within a producer session using sequence numbers. Across crashes, use the outbox.id as the Kafka message key and add consumer-side deduplication. See idempotency and exactly-once.
Schema evolution
The DB schema changes — add column, rename column, drop column. If the connector emits raw row diffs, consumers break when field names disappear or types change.
Use a Schema Registry (e.g. Confluent Schema Registry) with Avro or Protobuf. Enforce backward-compatible changes: only add optional fields; never rename or remove without a deprecation cycle. The connector registers the schema on first connect; each change event is tagged with a schema ID. Consumers fetch the schema from the registry on first encounter of a new ID, so they can evolve independently of the connector.
Log retention too short
The WAL slot anchors the WAL. But the Kafka topic retaining CDC events also has a retention policy. If a consumer falls far enough behind that events have been deleted from the topic, it must re-bootstrap from a snapshot.
Use compacted Kafka topics for CDC streams (keyed by row primary key). Compaction retains the latest value per key indefinitely; old intermediate updates may be dropped. This guarantees consumers can always reconstruct the current state even if they miss intermediate events, but they cannot replay history past the most recent value per key.
A closer look at the failure window
It helps to see exactly where the crack is in a naive dual-write, and where each solution plugs it. The diagram below shows all three approaches — naive dual-write, log-based CDC, and the outbox — side by side so the failure window is visible in each case.
flowchart TD
subgraph "Naive dual-write"
NW1[App writes to DB] -->|success| NW2[App publishes to Kafka]
NW1 -->|crash here| NW3["Event lost forever"]
style NW3 fill:#ff2e88,color:#fff
end
subgraph "Log-based CDC"
LW1[App writes to DB] --> LW2[WAL records committed change]
LW2 --> LW3[Connector reads WAL]
LW3 --> LW4[Kafka]
style LW2 fill:#15803d,color:#fff
style LW3 fill:#ff6b1a,color:#0a0a0f
end
subgraph "Transactional outbox"
OW1["App writes to DB + outbox (one txn)"] --> OW2["Relay reads outbox"]
OW2 --> OW3[Kafka]
OW2 -->|"crash before ack"| OW4["Relay retries — duplicate, not lost"]
style OW1 fill:#15803d,color:#fff
style OW4 fill:#ffaa00,color:#0a0a0f
end
Notice the difference in failure character. In the naive dual-write, a crash produces a silent data loss — the event is gone, no retry is possible. In log-based CDC and the outbox, the failure mode is a duplicate — the event may be published more than once, but it is never lost. Idempotent consumers handle duplicates cleanly; silent loss is unrecoverable.
Things to discuss in an interview
- Why log-based CDC is better than polling the source tables directly. Polling adds read load on the primary and has a higher minimum latency floor. More importantly, polling can miss intermediate states: if a row is updated twice between polls, you only see the final value — but the intermediate state may carry semantic meaning (e.g. a status transition from
PENDING→PROCESSING→FAILEDthat collapses to justFAILED). - The replication slot risk on Postgres. An unconsumed slot will eventually fill the disk. This is a common gotcha in production Postgres CDC deployments.
- Outbox vs. CDC when WAL isn't available. Some managed database services do not expose a standard logical replication stream, or require non-default configuration to enable it (Google Cloud SQL, some Aurora configurations). Amazon RDS for PostgreSQL does support logical replication via the
rds.logical_replicationparameter group flag. The outbox is the fallback when WAL/binlog access is unavailable, restricted by policy, or operationally impractical. - At-least-once delivery and idempotency. Neither CDC nor the outbox gives you exactly-once end-to-end. Exactly-once requires transactional consumers (Kafka transactions), idempotent producers, and consumer-side deduplication — all three together. In practice, at-least-once + idempotent consumers is the standard operating model. See idempotency and exactly-once.
- Relation to sagas. A saga is a sequence of local transactions with compensating transactions on failure. Each local transaction must reliably emit its event to trigger the next step. CDC or the outbox is what makes that emission reliable.
Things you should now be able to answer
- Why is dual-write fundamentally unsafe, and which specific failure window does it expose?
- What is a Postgres replication slot, and why does a stale one cause disk issues?
- How does log-based CDC guarantee that events match committed DB state?
- What is the transactional outbox, and why does it work even without WAL access?
- Why must CDC consumers be idempotent, and what does idempotent mean in practice?
- How do you bootstrap a CDC pipeline against a table that already has 100M rows?
- What is schema evolution in a CDC pipeline, and how does a Schema Registry help?
Further reading
- Debezium documentation — debezium.io (architecture, connector configs, snapshot strategies)
- "Bottled Water: Real-time integration of PostgreSQL and Kafka" — Martin Kleppmann, 2015 (the canonical explanation of Postgres logical decoding for CDC)
- Designing Data-Intensive Applications, Chapter 11 (Stream Processing) — Kleppmann, 2017
- Confluent CDC documentation — docs.confluent.io/kafka-connectors/debezium-*
- "Reliable microservices data exchange with the Outbox Pattern" — Gunnar Morling (Red Hat / Debezium maintainer) blog post
- Design a Distributed Message Queue — for Kafka internals that underpin CDC delivery guarantees
Frequently asked questions
▸What is the dual-write problem in distributed systems?
The dual-write problem arises when application code must write to a database and publish to a message broker like Kafka as two separate I/O calls with no shared transaction boundary. If the process crashes between the two operations, the database may have the data while the broker never received the event, or the reverse: a ghost event is published for a write that actually failed. The fundamental issue is that no atomicity guarantee spans these two systems.
▸How does log-based CDC guarantee that emitted events match committed database state?
A row change appears in the replication log (Postgres WAL or MySQL binlog) if and only if the transaction committed — rolled-back transactions leave no log entry. A connector like Debezium reads this log the same way a standby replica does and translates each entry into a structured change event. Because the log is the source of truth, the connector can only emit what the database already durably committed.
▸What is the transactional outbox pattern and when should I use it instead of log-based CDC?
The transactional outbox writes an outbox row inside the same database transaction as the business write, so both land atomically or neither does. A separate relay process then polls the outbox and publishes to the broker asynchronously. Use it when WAL or binlog access is unavailable or restricted, such as on some managed database services, or when you want application-defined event payloads rather than raw row diffs from the replication log.
▸Why must CDC and outbox consumers be idempotent, and can CDC deliver exactly-once?
Both patterns guarantee at-least-once delivery, not exactly-once. If the relay or connector crashes after publishing to Kafka but before recording the confirmed offset, the same event is republished on restart. Exactly-once end-to-end requires Kafka transactions, idempotent producers, and consumer-side deduplication all together, which is expensive to coordinate. The standard operating model is at-least-once delivery with idempotent consumers that deduplicate on the outbox ID or row primary key.
▸What is the replication slot lag risk on Postgres CDC deployments?
A Debezium replication slot prevents Postgres from recycling WAL segments until the connector confirms it has consumed up to a given LSN. If the connector falls behind or goes offline, WAL accumulates on the primary's disk and can cause a hard outage when disk fills. The article recommends alerting at 10 seconds of lag, paging at 60 seconds, keeping at most 2 to 3 slots per database, and setting max_slot_wal_keep_size (available in Postgres 13+) as a safety valve.
You may also like
Design a Shopping Cart & Checkout System
Keep a cart consistent across devices, then check out without overselling or double-charging. The available-cart vs consistent-checkout split, inventory holds, and the order saga.
Design a Social Graph Service (Facebook's TAO)
Serve billions of "who follows whom" reads over a graph of trillions of edges. The objects-and-associations model, a cache in front of sharded SQL, and the hot-vertex problem.
Design an Authorization System (Google Zanzibar / RBAC / ReBAC)
Answer "can user U do action A on resource R?" globally, in milliseconds, consistently. RBAC vs ABAC vs ReBAC, Zanzibar relation tuples, and the new-enemy problem.