Change Data Capture (CDC)CDC at Scale & PerformanceEasy⏱️ ~2 min

What is Change Data Capture (CDC)?

Definition
Change Data Capture (CDC) is a pattern where you stream all inserts, updates, and deletes from a primary database into other downstream systems, capturing what changed rather than periodically querying for current state.
The Core Problem: Your ecommerce site has an orders database that serves customer queries. But you also need that same data in your search index, analytics warehouse, and recommendation cache. How do you keep them all synchronized without overloading the primary database? The naive approach is periodic full table scans: every hour, query the entire orders table and update downstream systems. This breaks catastrophically at scale. A table with 100 million orders takes 30 minutes to scan, overloading the Online Transaction Processing (OLTP) database during peak hours. You also miss near real time requirements and cannot easily tell what changed between scans. How CDC Solves This: Instead of querying tables, CDC reads the database's internal commit log or write ahead log (WAL). Every database transaction appends records to this log before marking the transaction as committed. CDC tails this log continuously, converts log entries into structured change events, and publishes them to a durable event stream. Think of it like following a ledger: the database writes "order 12345 created", "order 12345 status updated to shipped", and CDC captures each change as it happens. Downstream consumers subscribe to this stream and update their systems independently. Why This Matters: CDC decouples your source of truth from derived systems. The primary database only worries about serving user traffic with low latency. Everything else consumes changes asynchronously. At companies like Netflix and Uber, CDC pipelines process hundreds of thousands of changes per second, keeping search indexes fresh within seconds and data warehouses updated within minutes, all without impacting customer facing query performance.
💡 Key Takeaways
CDC captures database changes by reading the commit log rather than querying tables, avoiding overload on the primary database
Log based CDC preserves the exact order and timing of changes, something periodic full table scans cannot provide
Changes flow through a durable event stream where multiple independent consumers can subscribe without affecting each other
At scale, CDC enables near real time propagation with p50 latency typically under 500ms from commit to downstream visibility
📌 Examples
1An ecommerce platform with 10 million daily users uses CDC to keep search indexes synchronized within 1 to 5 seconds while the primary database handles 50k writes per second
2Netflix uses log based CDC to propagate user activity changes to personalization models with sub minute latency without impacting OLTP query performance
← Back to CDC at Scale & Performance Overview