Timestamp CDC vs Log-based CDC

Understanding when to use timestamp based CDC versus log based CDC is critical for system design interviews. The choice fundamentally depends on your correctness requirements, latency tolerance, operational complexity budget, and database capabilities.

Timestamp Based CDC:

You add a last_updated column to your tables. A polling service queries WHERE last_updated > watermark every 30 to 300 seconds. This works with essentially any database: PostgreSQL, MySQL, Oracle, even non-relational stores like DynamoDB if they support timestamp fields.

Advantages: Simple to implement, requiring only application level schema changes. No database administrator (DBA) permissions for replication slots or log access. Portable across heterogeneous data stores. Low initial cost.

Disadvantages: Latency is bounded by polling frequency, typically 30 seconds at minimum. Correctness depends on application discipline (every write must update the timestamp). Deletes require soft delete patterns or separate audit tables. Heavy scans on high churn tables create database load.

Log Based CDC:

You configure tools like Debezium or AWS Database Migration Service (DMS) to read the database Write Ahead Log (WAL) or binlog. Every committed transaction is captured as an event, typically with sub-second latency.

Advantages: Near real-time replication with p99 latency under 1 second. Captures all changes including deletes without schema modifications. No application code changes needed. Events are totally ordered per partition.

Disadvantages: Requires DBA level database configuration. Database specific (PostgreSQL logical replication, MySQL binlog, Oracle GoldenGate). More complex operational model. Log retention policies affect how far back you can replay.

Timestamp CDC
90s p50 latency, simple setup, any database
vs
Log Based CDC
500ms p99 latency, complex setup, DB specific
Decision Framework:

Choose timestamp based CDC when:

Your freshness requirements are in minutes, not seconds. Analytics dashboards, daily reporting, and batch Machine Learning (ML) feature generation can tolerate 2 to 5 minute staleness.

You control the application schema and can enforce timestamp discipline. You own the codebase and can mandate updated_at columns with automatic updates.

You need to replicate from heterogeneous sources. Your stack includes PostgreSQL, MongoDB, and a proprietary system that has no log access but supports timestamps.

You want operational simplicity and your team lacks deep database expertise. Configuring replication slots and managing log retention is beyond your operational comfort zone.

Choose log based CDC when:

You need sub-second latency. Real-time fraud detection, live inventory updates, or operational dashboards require data freshness under 5 seconds.

You must capture all changes including hard deletes. Compliance or data governance rules require perfect replication of all database state transitions.

You have significant scale and want to minimize source database load. Reading logs imposes minimal overhead compared to repeated range scans on high churn tables.

You need strong ordering guarantees. Log based CDC preserves transaction order per partition, critical for event sourcing or state machine replication.

⚠️ Common Pitfall: Teams often start with timestamp CDC for ease, then face a painful migration to log based CDC when latency requirements tighten. If you know you will need sub-second freshness within 12 months, consider the migration cost upfront.
Hybrid Approaches:

Some teams run both. Timestamp based CDC feeds analytics systems with minute level freshness and low operational burden. Log based CDC feeds real-time operational systems like fraud detection or inventory reservation. Each pipeline serves different Service Level Agreements (SLAs) with appropriate trade-offs.

Real Numbers:

A typical timestamp based CDC system might achieve p50 latency of 60 to 90 seconds and p99 of 3 to 5 minutes, with database overhead around 10 percent of CPU during normal operations.

A log based CDC system typically achieves p50 latency of 200 to 500 milliseconds and p99 under 2 seconds, with database overhead under 5 percent because reading logs is cheaper than repeated queries.

💡 Key Takeaways

✓Timestamp based CDC achieves p50 latency of 60 to 90 seconds with simple setup across any database, while log based CDC achieves p50 of 200 to 500 milliseconds but requires database specific configuration

✓Choose timestamp CDC for analytics and reporting with 2 to 5 minute staleness tolerance, heterogeneous data sources, and teams without deep database operations expertise

✓Choose log based CDC for real-time systems needing sub-second latency, capturing hard deletes, minimal source database overhead (under 5 percent), and strong ordering guarantees

✓Timestamp CDC requires application discipline to maintain <code>updated_at</code> columns and cannot natively capture deletes, while log based CDC captures all changes automatically from transaction logs

✓Migration cost from timestamp to log based CDC is significant: if you anticipate sub-second freshness requirements within 12 months, consider starting with log based CDC or plan migration timing carefully

📌 Interview Tips

1A media streaming service uses timestamp CDC for analytics dashboards (5 minute freshness) and log based CDC for real-time content recommendation (500 millisecond freshness), each optimized for its SLA

2An inventory system initially used timestamp CDC with 2 minute polling, but during flash sales, stock levels appeared available after actual sellout; migrating to log based CDC reduced reservation conflicts from 5 percent to 0.1 percent

3A financial platform runs Debezium on PostgreSQL, achieving 300 millisecond p99 replication lag to fraud detection systems, compared to 90 second lag with previous timestamp based polling

← Back to Timestamp-based CDC Overview