Change Data Capture (CDC)CDC at Scale & PerformanceMedium⏱️ ~3 min

How CDC Capture Works at Scale

The Capture Challenge: Your database commits 50,000 transactions per second. Each transaction writes multiple rows. You need to capture every change in commit order, convert it to a structured event, and publish it downstream without adding more than a few milliseconds to database overhead. How? Log Based Capture Mechanism: Modern CDC reads directly from the database's write ahead log (WAL) or binlog. When a transaction commits, the database has already written all changes to this log for durability. The CDC capture service tails this log in streaming mode, reading new entries as they appear. For a single shard or database instance, one CDC capture process runs continuously. It reads log records sequentially, groups them by transaction boundaries, and transforms them into change events. Each event includes the operation type (INSERT, UPDATE, DELETE), all column values, a timestamp, and a log sequence number.
Typical CDC Pipeline Latency
2-10ms
LOG READ
50-100ms
PUBLISH
300-500ms
P50 END TO END
Preserving Transaction Atomicity: The capture service buffers all records within a transaction boundary. Only when it reads the commit marker does it emit the grouped set of change events. This preserves atomicity semantics for downstream consumers that care about seeing complete transactions. Checkpointing for Recovery: Every few seconds, the capture service records its current log sequence number in a durable offset store. If the capture process crashes and restarts, it resumes from the last checkpoint. This may replay some recent events (downstream consumers must handle duplicates), but it never skips data. Handling Backpressure: What if downstream systems slow down? The capture service monitors its publish lag to the distributed log. If lag exceeds a threshold, it throttles how fast it reads from the database log, preventing memory exhaustion. The database retains logs for 48 hours or more, giving time to catch up. At LinkedIn and Uber, CDC capture runs as a multi tenant service. Each database shard has one active capture instance for ordering guarantees, with a standby ready for failover. The capture service is carefully optimized: reading from the log is sequential disk IO (very fast), and transformation is lightweight CPU work. The incremental cost to the OLTP database is minimal because you are reading a log that already exists for replication and recovery.
💡 Key Takeaways
CDC capture reads the database commit log sequentially, which is fast sequential disk IO and adds minimal overhead to the OLTP database
Transaction boundaries are preserved by buffering records until a commit marker, then emitting grouped change events for atomicity
Checkpointing every few seconds to a durable offset store ensures no data loss on restart, though duplicates may occur
Backpressure mechanisms throttle log reading when downstream systems slow down, preventing memory exhaustion while relying on 48+ hour log retention
With 50k writes per second, log based capture typically adds only 2 to 10ms read latency and publishes within 50 to 100ms to the event stream
📌 Examples
1At LinkedIn, each database shard has one active CDC capture instance reading the binlog, with standby instances ready for sub second failover
2A CDC capture service processing 50k writes per second uses sequential log reads (under 10ms) and lightweight transformation, consuming less than 2 CPU cores
← Back to CDC at Scale & Performance Overview