Change Data Capture (CDC) • CDC Fundamentals & Use CasesMedium⏱️ ~3 min
CDC Capture Techniques: Log, Trigger, and Query Based
Three Ways to Capture Changes: CDC systems must detect every database mutation without disrupting production workloads. There are three main techniques, each with distinct trade offs in performance, portability, and operational complexity.
Log Based CDC (Production Standard): This approach reads the database transaction log or Write Ahead Log (WAL), the internal structure that databases use for crash recovery and replication. Every committed transaction is already recorded here with precise ordering. A CDC connector tails this log, parses binary or logical records, and emits structured change events.
The performance advantage is significant. Reading the log is a passive operation with near zero impact on the write path. At 20,000 writes per second, log based CDC adds negligible overhead, typically under 1% CPU on the database host. Latency from commit to CDC event is usually 100 to 500 milliseconds.
The catch is database specificity. PostgreSQL has logical decoding, MySQL has the binlog, MongoDB has the oplog. Each requires custom parsing logic. Tools like Debezium provide connectors for major databases, but proprietary or legacy systems may not expose usable logs.
Trigger Based CDC (Portable but Costly): Database triggers fire on every insert, update, or delete, writing change records into a separate CDC table. A polling process reads this table and publishes events.
This is portable across any SQL database that supports triggers. You can retrofit CDC onto systems without log access. However, triggers execute synchronously on the write path. At high throughput, this adds measurable latency. In benchmarks, trigger based CDC can increase p99 write latency from 5ms to 12ms and reduce peak throughput by 20 to 30% due to additional write amplification.
Query Based CDC (Simplest, Least Capable): This polls tables using a timestamp column like
Write Path Overhead
< 1%
LOG BASED CPU
+7ms
TRIGGER P99 HIT
updated_at or an incrementing version column. Every few seconds, a SELECT query fetches rows modified since the last poll.
This requires no special database configuration and works with any system. However, it misses deletes unless you use soft deletes. Clock skew can cause missed updates if timestamps are not monotonic. Polling intervals typically range from 10 to 60 seconds, making latency higher than log based approaches. Query based CDC is suitable for low volume tables with change rates under 100 rows per second and where multi second lag is acceptable.
⚠️ Common Pitfall: Query based CDC cannot reliably capture deletes or distinguish between a newly inserted row and an updated row with the same timestamp. Log and trigger based approaches provide explicit operation types.
💡 Key Takeaways
✓Log based CDC tails the transaction log with under 1% overhead and subsecond latency, but requires database specific connectors
✓Trigger based CDC is portable and works on any SQL database, but adds 20 to 30% write path overhead at high throughput
✓Query based CDC is simplest to implement but has higher latency (10 to 60 seconds), misses deletes, and is prone to clock skew issues
✓Production systems handling over 10,000 writes per second almost always use log based CDC for minimal performance impact
📌 Examples
1PostgreSQL logical replication slot consumed by Debezium connector emitting 50,000 change events per second with p99 latency under 300ms
2MySQL trigger writing to a <code>cdc_events</code> table increases write latency from 4ms to 11ms on a table with 15,000 inserts per second
3Query based CDC polling a <code>users</code> table every 30 seconds using <code>WHERE updated_at > ?</code> misses hard deletes entirely