Change Data Capture (CDC)Trigger-based CDC PatternsMedium⏱️ ~3 min

How Trigger Based CDC Works at Scale

The Architecture: Trigger based CDC has three main components working together. First, row level triggers attached to each source table. Second, carefully designed change tables that accumulate change records. Third, a CDC reader service that bridges from the database to streaming systems. Each component needs specific optimization for production workloads. Consider a realistic example: an ecommerce platform handling 2000 to 5000 write operations per second on an orders database. The system needs to feed a data warehouse, update a search index, and invalidate caches, all within a few seconds.
1
Trigger Design: Define AFTER INSERT, UPDATE, and DELETE triggers at row level. For updates, compare old and new values to record only changed columns. A typical trigger adds 1 to 3 milliseconds p50 latency and 5 to 10 milliseconds p99, depending on complexity and indexing.
2
Change Table Structure: Each row contains synthetic sequence ID for ordering, source primary key, operation type (INSERT/UPDATE/DELETE), commit timestamp, transaction ID, and changed values. Indexes optimize for append heavy writes and range scans on sequence ID.
3
CDC Reader Service: Polls change tables every 200 to 500 milliseconds, reading batches of 1000 to 10000 rows with sequence_id greater than last processed offset. Converts rows to normalized events and publishes to Kafka or similar message bus.
Typical Performance Profile
10k-50k
EVENTS/SEC
0.5-3s
END TO END LAG P95
Multi Table Transactions: When a single business operation touches multiple tables, their triggers write separate rows to change tables. To preserve transactional boundaries downstream, tag all change rows with the same transaction_id and commit_timestamp. The CDC reader can emit them as grouped batches, allowing downstream systems to process atomically. Consumer Pattern: Downstream, separate consumers subscribe to change streams. A search indexing service updates Elasticsearch. A data pipeline writes to object storage then BigQuery with less than 1 minute end to end delay p95. A cache service invalidates Redis entries. From a design perspective, trigger based CDC isolates the operational database from downstream load. Only the CDC service queries the database directly.
💡 Key Takeaways
CDC reader service polls change tables in batches of 1000 to 10000 rows every 200 to 500 milliseconds, tracking last processed offset in metadata store
Change tables use synthetic sequence IDs and indexes optimized for append heavy writes and range scans, crucial for handling 2000 to 5000 writes per second
Transaction IDs and commit timestamps preserve multi table transaction boundaries, allowing downstream systems to process related changes atomically
At moderate scale, systems process 10000 to 50000 change events per second with 0.5 to 3 seconds end to end lag at p95
Downstream consumers read from message bus rather than database directly, isolating operational database from analytical or indexing load
📌 Examples
1An orders table with AFTER UPDATE trigger that captures status changes. When status changes from pending to confirmed, trigger writes: {sequence_id: 187432, operation: UPDATE, pk: ord_9283, tx_id: tx_44871, before: {status: pending}, after: {status: confirmed}}
2CDC reader service queries: SELECT * FROM orders_changes WHERE sequence_id > 187000 ORDER BY sequence_id LIMIT 5000. Processes batch, converts to OrderStatusChanged events, publishes to Kafka topic orders.changes, then updates last_processed_offset to 192000 in metadata store
← Back to Trigger-based CDC Patterns Overview