Definition
Change Data Capture (CDC) is a pattern that detects and captures every insert, update, and delete operation in a database, then streams these changes to other systems in real time with ordering guarantees.
The Core Problem: Modern applications face a fundamental challenge: operational databases hold the source of truth, but many other systems need that same data. Your ecommerce platform stores orders in PostgreSQL, but you also need those orders in your data warehouse for analytics, in Elasticsearch for search, in Redis for caching, and in your fraud detection system.
The naive approach is batch Extract, Transform, Load (ETL) that re-reads entire tables every few hours. This creates heavy load on your primary database and introduces high latency: your warehouse might be 6 hours behind reality. Another tempting solution is dual writes, where your application code writes to the database AND immediately writes to each downstream system. But this creates race conditions. What happens when the database write succeeds but the cache write fails? Your systems diverge.
How CDC Solves This: CDC sits between your database and downstream consumers. It watches the database transaction log, the internal record of every committed change, and converts these changes into structured events. When your application inserts a new order, CDC captures that insert within milliseconds, packages it as an event with the operation type, the full row data, and transaction metadata, then publishes it to a message bus like Kafka.
Downstream systems subscribe to these CDC events and process them independently. Your warehouse loads them every 2 minutes. Your cache applies them instantly. Your fraud system analyzes patterns in real time. All from one reliable stream, with no dual write complexity in your application code.
✓CDC captures inserts, updates, and deletes from the database transaction log with minimal overhead
✓Events are delivered to downstream consumers via a message bus, decoupling producers from consumers
✓Typical end to end latency is subsecond to a few seconds, compared to hours with batch ETL
✓CDC eliminates dual write complexity and race conditions by centralizing change capture at the database layer