Change Data Capture (CDC)CDC Data Consistency GuaranteesMedium⏱️ ~3 min

Transactional Consistency and Ordering in CDC

The Core Challenge: A production database handling 50,000 writes per second commits hundreds of transactions per second, each potentially touching multiple rows across multiple tables. Your CDC pipeline must preserve the semantic meaning of these transactions as they flow to downstream systems. Consider an e-commerce transaction that atomically updates four tables: creates an order record, decrements inventory by 3 units, records a payment authorization, and updates the customer's lifetime value. In the database, these four changes either all commit or all roll back. But in a distributed stream, they become four separate events that could arrive at different times to different consumers. How Log Based CDC Preserves Transactions: Modern CDC systems read the database's Write Ahead Log (WAL) or transaction log, which records every committed change with metadata including a Log Sequence Number (LSN) that increases monotonically. Each change includes its transaction identifier and commit timestamp. The CDC connector groups all changes with the same transaction ID together before publishing them to the stream. This means the four changes from our e-commerce transaction get packaged with shared metadata: transaction ID txn_4829183, commit LSN 9284719, commit time 2024-01-15T10:30:42.193Z.
Consistency Impact
4 writes
SEPARATE EVENTS
1 txn
SHARED ID
Ordering Guarantees Through Partitioning: To scale beyond a few thousand events per second, CDC systems partition the stream by entity key. All changes to order_id 78234 go to the same partition, preserving their commit order. This creates a trade off: you get per key ordering (sufficient for most use cases) but not global ordering across unrelated entities. Two orders created simultaneously might appear in different orders to different consumers. But all updates to a single order always appear in commit sequence.
⚠️ Common Pitfall: Downstream systems that need to JOIN data across entities can see temporarily inconsistent views. If transaction txn_4829183 updates both order 78234 and user 5512, these updates land in different partitions. A consumer reading both might see the updated order before the updated user for a few milliseconds. Design consumers to tolerate these small windows of inconsistency.
Real World Scale Example: At companies like Uber processing millions of trip updates per day, CDC systems partition trip events by trip_id. This ensures all state transitions for a trip (requested, assigned, started, completed) arrive in order. The stream might have 1,000 partitions, each handling 10,000 to 50,000 events per second, maintaining per trip ordering while achieving aggregate throughput of 10 million events per second.
💡 Key Takeaways
Log based CDC reads transaction logs with monotonically increasing LSNs, preserving commit order without querying production tables
Changes within a single transaction share a transaction ID, allowing consumers to reconstruct transactional boundaries
Partitioning by primary key enables horizontal scaling to millions of events per second while maintaining per entity ordering
Global ordering across unrelated entities is sacrificed for throughput, creating small windows where cross entity joins see temporary inconsistency
Production systems like Uber use 1,000 plus partitions to achieve 10 million events per second throughput with per key ordering
📌 Examples
1A banking transaction transfers $500 from account A to account B. The CDC stream publishes two events with shared transaction ID txn_993847: debit from A with balance update, credit to B with balance update. Consumers processing both see the atomic nature of the transfer.
2During a flash sale, 10,000 orders are created per second. Orders with IDs ending in 0 through 9 are distributed across 10 partitions. All updates to order 78230 (created, items added, payment processed, shipped) land in partition 0 in commit sequence, even as other orders process in parallel.
← Back to CDC Data Consistency Guarantees Overview
Transactional Consistency and Ordering in CDC | CDC Data Consistency Guarantees - System Overflow