CDC vs Alternatives: When to Use Change Data Capture

The Decision Matrix: CDC is not always the right choice. Understanding when to use CDC versus batch ETL, dual writes, or event sourcing requires analyzing your requirements around latency, consistency, operational complexity, and system ownership.

Batch ETL
Hours of latency, simple ops, full table scans
vs
CDC Streaming
Subsecond latency, complex ops, incremental changes
CDC vs Batch ETL: Traditional batch ETL pulls entire tables on a schedule, often nightly or hourly. This is drastically simpler to implement: a cron job running SELECT star queries and bulk loading into a warehouse. Operational complexity is minimal.

However, batch ETL introduces unavoidable latency. Even hourly batches mean your downstream systems are 30 to 60 minutes behind on average. At 10 million rows, a full table scan takes 10 to 30 seconds of database load per batch. Multiply this by dozens of tables and you create periodic load spikes that impact production queries.

Choose batch ETL when freshness requirements are measured in hours, change volume is low (under 1,000 rows per minute), and you want minimal operational overhead. Choose CDC when you need subsecond to minute level freshness, have high write throughput (over 10,000 writes per second), or when pulling full tables creates unacceptable load on your primary database.

CDC vs Dual Writes: Dual writes from application code to both the database and downstream systems (cache, search index) seem appealing: no lag, full control, simple to understand.

The problem is consistency. If your application writes to PostgreSQL successfully but the write to Elasticsearch fails due to a network partition or service outage, your systems diverge. Handling retries correctly is surprisingly difficult. If you retry the failed write, but the database transaction rolled back, you insert phantom data. If you don't retry, you have permanent inconsistency.

CDC eliminates this entire class of bugs by centralizing responsibility at the database layer. The database transaction log is the single source of truth. CDC adds 100 milliseconds to a few seconds of lag compared to synchronous dual writes, but guarantees that downstream systems eventually see every committed change.

Choose dual writes only for ephemeral data where inconsistency is acceptable (analytics events, logs). Choose CDC for critical data requiring correctness guarantees (orders, payments, user accounts).

CDC vs Event Sourcing: Event sourcing makes the event log itself the primary source of truth. Application state is derived by replaying events. This is fundamentally different from CDC, where the database remains primary and CDC is a projection.

Event sourcing provides richer domain semantics. Instead of seeing "order total changed from 100 to 120", you see explicit events like OrderItemAdded, DiscountApplied. This makes auditing and debugging more expressive.

However, event sourcing requires designing your entire application around events from day one. Queries become complex: to answer "show me all active users", you must replay potentially millions of events to build a materialized view. For existing systems with established OLTP databases, retrofitting event sourcing is impractical.

"For greenfield systems where events are central to your domain model, event sourcing can be more expressive. For existing systems and third party databases, CDC is the pragmatic choice."

Choose event sourcing for greenfield systems where domain events are first class citizens and you're willing to accept the operational complexity of replaying event streams. Choose CDC when working with existing databases, third party systems, or when you want transactional writes without fundamentally changing your application architecture.

Decision Criteria Summary: Use CDC when you have multiple downstream consumers (3 or more), require freshness under 5 minutes, have high write throughput (over 5,000 writes per second), or need strong consistency without dual write complexity. Accept the operational overhead of running connectors, message buses, and schema registries. Avoid CDC for simple, low volume use cases where batch ETL would suffice, or for greenfield systems where event sourcing aligns better with your domain model.

💡 Key Takeaways

✓Batch ETL is simpler but introduces hours of latency and creates periodic load spikes from full table scans, acceptable when freshness SLAs are measured in hours

✓Dual writes from application code create race conditions and consistency bugs; CDC eliminates this by using the database transaction log as the single source of truth

✓Event sourcing makes events primary and provides richer domain semantics, but requires greenfield design; CDC is pragmatic for existing databases

✓Choose CDC when you have 3 or more downstream consumers, need under 5 minute freshness, or handle over 5,000 writes per second with strong consistency requirements

📌 Interview Tips

1Financial ledger system choosing event sourcing for explicit AuditTrail events and regulatory compliance, accepting the complexity of event replay for queries

2Ecommerce platform choosing CDC over dual writes after encountering race conditions where database commits succeeded but cache writes failed during network partitions, causing 2% of orders to have stale inventory data

3Analytics warehouse choosing nightly batch ETL over CDC because reporting SLAs allow 24 hour lag and change volume is only 500 rows per minute

← Back to CDC Fundamentals & Use Cases Overview