Production Reality: Scale and Latency

In production systems, timestamp based CDC reveals its characteristics most clearly at the intersection of scale, freshness requirements, and database load constraints. Understanding these dynamics is essential for making informed architectural decisions.

Typical Production Setup:

Consider an e-commerce platform handling 20,000 transactions per second at peak. The primary database writes to tables like orders, payments, inventory, and user activity. A data warehouse powers business intelligence dashboards that marketing and operations teams check throughout the day, with acceptable staleness of 2 to 5 minutes.

The CDC pipeline runs every 90 seconds. Key tables like orders might see 50,000 to 200,000 updates per 90 second window during peak hours. The CDC extractor:

First, queries for changes using an index on updated_at. With proper indexing, retrieving 100,000 rows from a 500 million row table takes 2 to 4 seconds of database time.

Second, publishes these changes to Kafka topics, where multiple consumers (data warehouse loader, search indexer, cache invalidator) process them independently.

Third, each downstream system applies changes idempotently using primary keys for upserts. The data warehouse sees p50 freshness of roughly 90 seconds (half the polling interval) and p99 freshness of 3 to 4 minutes including processing time.

End to End Latency Budget
45s
POLL DELAY
15s
EXTRACT
30s
LOAD
The Database Load Trade-off:

More frequent polling means fresher data but higher database load. Imagine you want to reduce latency from 90 seconds to 30 seconds. Tripling polling frequency triples the number of range scans per hour on your timestamp index.

For a table with 1 billion rows and an index on updated_at, each scan touches index pages. With 90 second polling (40 runs per hour), if each scan reads 500 MB of index pages from cache, that is 20 GB per hour of index scan overhead. At 30 second polling (120 runs per hour), that jumps to 60 GB per hour. At some point, this competes with your primary workload for CPU and memory.

"The constraint is rarely the CDC application itself. It's how much read load your primary database can tolerate while maintaining p99 write latency Service Level Objectives (SLOs) for user facing traffic."
When Volume Explodes:

If a marketing campaign causes a traffic spike and your update rate jumps from 20,000 to 100,000 per second, each CDC run suddenly retrieves 9 million rows instead of 1.8 million. Processing time balloons. Your job that normally completes in 15 seconds now takes 90 seconds, and you start missing polling intervals.

The CDC system falls behind. Lag increases from 2 minutes to 10 minutes, then 30 minutes. Downstream dashboards show stale data. This is the moment where timestamp based CDC shows its limits: it cannot elastically scale with sudden traffic bursts without careful capacity planning.

Real World Adoption:

Many mid-scale companies use timestamp based CDC for analytics pipelines where minute level freshness is acceptable and traffic is predictable. When requirements tighten to sub-second latency, or when data volume approaches billions of daily changes, teams migrate to log based CDC (reading database Write Ahead Logs or transaction logs), which provides lower latency and stronger consistency guarantees but requires deeper database integration.

💡 Key Takeaways

✓Production systems typically achieve p50 freshness equal to half the polling interval: 90 second polls yield 45 second median lag, plus 15 to 30 seconds processing time

✓Database load scales linearly with polling frequency: tripling from 90 second to 30 second intervals triples index scan overhead, potentially impacting primary workload performance

✓For a table with 100,000 updates per 90 seconds, each CDC run extracts roughly that volume; traffic spikes can cause retrieval volume to explode, increasing processing time and creating lag accumulation

✓Typical configuration targets 10 to 15 percent of database CPU for CDC operations, balancing freshness against production workload protection

✓Teams often start with timestamp based CDC for analytics use cases with 2 to 5 minute staleness tolerance, then migrate to log based CDC when requirements tighten to sub-second freshness or scale reaches billions of daily events

📌 Interview Tips

1An e-commerce platform runs CDC every 90 seconds on a 500 million row orders table, extracting 100,000 changes per run with 4 second query time and achieving p95 data warehouse freshness of 3 minutes

2During a Black Friday traffic spike, update volume jumps from 50,000 to 400,000 per polling window; CDC job duration increases from 15 seconds to 90 seconds, causing lag to accumulate to 20 minutes before operators increase polling parallelism

3A financial services company partitions CDC by customer ID ranges into 10 independent jobs, each with separate watermarks, distributing load and achieving 10x throughput compared to single threaded polling

← Back to Timestamp-based CDC Overview