Change Data Capture (CDC)CDC at Scale & PerformanceMedium⏱️ ~3 min

CDC vs Query Based Replication: When to Choose Each

The Core Trade Off: Log based CDC and query based replication both sync downstream systems, but they make fundamentally different trade offs between simplicity and scale.
Query Based (Poll Tables)
Simple to implement, collapses above 5k writes/sec
vs
Log Based CDC
Efficient at 50k+ writes/sec, complex setup
Query Based Replication: The simple approach is polling tables for changes. Every minute, run SELECT * FROM orders WHERE updated_at > last_poll_time. This works beautifully at small scale. If you have 100 writes per second and can tolerate 1 minute lag, query based replication is easier to understand, debug, and deploy. No special database permissions, no log parsing complexity. But it breaks down as writes grow. At 5,000 writes per second, you generate 300,000 changed rows per minute. Scanning that many rows every minute adds 5 to 15 seconds of query time, overloading the OLTP database. You also risk missing deletes (no updated_at column exists after deletion) and updates that happen twice in one polling window (you only see the final state, not intermediate changes). When Query Based Works: Systems with under 1,000 writes per second, where downstream lag of 1 to 5 minutes is acceptable, and where you do not need strict ordering or delete tracking. Think small SaaS products, internal admin dashboards, or batch analytics jobs. Log Based CDC Advantages: At 50,000 writes per second, log based CDC reads sequentially from the commit log without adding load to the query serving path. It captures every change in exact commit order, including deletes. Latency drops from minutes to under 1 second. The operational cost is that you need database level access to read logs and must handle schema evolution carefully.
"The decision is not about which is better overall. It is: what is your write rate, and what latency can you tolerate?"
Decision Framework: Choose query based if you have fewer than 1,000 writes per second and can accept 1 to 5 minute lag. Choose log based CDC when you exceed 5,000 writes per second, need sub second freshness, or require strict ordering and delete tracking. The crossover point where CDC operational complexity becomes worthwhile is around 2,000 to 5,000 writes per second. Hybrid Approaches: Some teams use query based replication for low volume tables (configuration, user profiles) and log based CDC for high volume tables (events, orders). This balances simplicity where it works with efficiency where it is needed.
💡 Key Takeaways
Query based replication is simpler but starts overloading OLTP databases above 5,000 writes per second with multi second scan times
Log based CDC handles 50,000+ writes per second efficiently with sub second latency but requires database level log access and careful schema management
Query based can miss deletes and intermediate updates between polling windows, while CDC captures every change in exact order
The crossover point where CDC operational complexity becomes worthwhile is approximately 2,000 to 5,000 writes per second
Decision criteria: under 1k writes per second with 1 to 5 minute lag tolerance favors query based, above 5k writes per second or sub second requirements favors log based CDC
📌 Examples
1A small SaaS product with 200 writes per second uses query based replication every 2 minutes to sync to analytics, avoiding CDC complexity
2An ecommerce platform at 50k writes per second uses log based CDC to update search indexes within 1 second while query based would cause 15+ second table scans
← Back to CDC at Scale & Performance Overview