Change Data Capture (CDC) • Trigger-based CDC PatternsMedium⏱️ ~3 min
How Trigger Based CDC Works at Scale
The Architecture:
Trigger based CDC has three main components working together. First, row level triggers attached to each source table. Second, carefully designed change tables that accumulate change records. Third, a CDC reader service that bridges from the database to streaming systems.
Each component needs specific optimization for production workloads. Consider a realistic example: an ecommerce platform handling 2000 to 5000 write operations per second on an orders database. The system needs to feed a data warehouse, update a search index, and invalidate caches, all within a few seconds.
Multi Table Transactions:
When a single business operation touches multiple tables, their triggers write separate rows to change tables. To preserve transactional boundaries downstream, tag all change rows with the same
1
Trigger Design: Define AFTER INSERT, UPDATE, and DELETE triggers at row level. For updates, compare old and new values to record only changed columns. A typical trigger adds 1 to 3 milliseconds p50 latency and 5 to 10 milliseconds p99, depending on complexity and indexing.
2
Change Table Structure: Each row contains synthetic sequence ID for ordering, source primary key, operation type (INSERT/UPDATE/DELETE), commit timestamp, transaction ID, and changed values. Indexes optimize for append heavy writes and range scans on sequence ID.
3
CDC Reader Service: Polls change tables every 200 to 500 milliseconds, reading batches of 1000 to 10000 rows with sequence_id greater than last processed offset. Converts rows to normalized events and publishes to Kafka or similar message bus.
Typical Performance Profile
10k-50k
EVENTS/SEC
0.5-3s
END TO END LAG P95
transaction_id and commit_timestamp. The CDC reader can emit them as grouped batches, allowing downstream systems to process atomically.
Consumer Pattern:
Downstream, separate consumers subscribe to change streams. A search indexing service updates Elasticsearch. A data pipeline writes to object storage then BigQuery with less than 1 minute end to end delay p95. A cache service invalidates Redis entries. From a design perspective, trigger based CDC isolates the operational database from downstream load. Only the CDC service queries the database directly.💡 Key Takeaways
✓CDC reader service polls change tables in batches of 1000 to 10000 rows every 200 to 500 milliseconds, tracking last processed offset in metadata store
✓Change tables use synthetic sequence IDs and indexes optimized for append heavy writes and range scans, crucial for handling 2000 to 5000 writes per second
✓Transaction IDs and commit timestamps preserve multi table transaction boundaries, allowing downstream systems to process related changes atomically
✓At moderate scale, systems process 10000 to 50000 change events per second with 0.5 to 3 seconds end to end lag at p95
✓Downstream consumers read from message bus rather than database directly, isolating operational database from analytical or indexing load
📌 Examples
1An orders table with AFTER UPDATE trigger that captures status changes. When status changes from pending to confirmed, trigger writes: {sequence_id: 187432, operation: UPDATE, pk: ord_9283, tx_id: tx_44871, before: {status: pending}, after: {status: confirmed}}
2CDC reader service queries: SELECT * FROM orders_changes WHERE sequence_id > 187000 ORDER BY sequence_id LIMIT 5000. Processes batch, converts to OrderStatusChanged events, publishes to Kafka topic orders.changes, then updates last_processed_offset to 192000 in metadata store