Trigger Based vs Log Based CDC: Decision Framework

The Core Trade Off:
Choosing between trigger based and log based CDC is fundamentally about trading database overhead and operational complexity against infrastructure requirements and portability.

Trigger Based CDC
20-40ms latency, 2-3k writes/sec limit, no log access needed
vs
Log Based CDC
5-10ms overhead, 50k+ writes/sec, requires log access
When Triggers Win:
First, when you lack transaction log access. Many managed database services like Amazon Relational Database Service (RDS) for certain engines, Azure SQL Database tiers, or Google Cloud SQL configurations restrict log reading. Enterprise environments with strict database administrator (DBA) controls often deny log access for security reasons. Triggers work with standard database permissions.

Second, when you need per row business logic at capture time. Triggers let you mask personally identifiable information (PII), filter out test data, or enrich events with reference lookups, all within the transaction. A trigger can check if user.is_test_account and skip writing to the change table. Log based CDC captures everything blindly, requiring downstream filtering.

Third, for moderate throughput workloads under 2000 to 3000 writes per second where the 10 to 30 percent overhead and 20 to 40 millisecond latency penalty are acceptable. Many internal tools, content management systems, and customer relationship management (CRM) systems fit this profile.

When Logs Win:
High throughput systems processing over 5000 writes per second per node need log based CDC. The overhead difference is stark: log reading adds under 5 percent CPU load versus 10 to 30 percent for triggers. At 50000 writes per second, trigger overhead would overwhelm the primary database.

Log based CDC captures everything including schema changes, truncates, and bulk operations with perfect fidelity. Triggers can miss edge cases or require custom handling for each operation type. Tools like Debezium provide battle tested implementations that handle schema evolution, exactly once semantics, and operational failover.

For write heavy analytical workloads, the math is clear. Log based CDC at 50000 events per second with 5 percent overhead versus trigger based at 2000 events per second with 30 percent overhead. If you can get log access, logs scale better.

"The decision is not which technology is better. It is whether you can access logs, what your write throughput is, and whether you need capture time logic."
Hybrid Approach:
Some teams start with triggers on legacy or restricted systems, then migrate to log based CDC when they gain access or upgrade infrastructure. During migration, both can run in parallel for validation. The trigger based system serves as a reference to verify log based capture completeness.

Interview Angle:
When asked about CDC in system design interviews, frame your answer around these constraints. State your assumed write throughput (example: 3000 versus 30000 writes per second), whether you have log access, and whether you need filtering at source. This shows you understand real world trade offs rather than just naming technologies.

💡 Key Takeaways

✓Choose trigger based CDC when write throughput is under 2000 to 3000 operations per second and you lack transaction log access or need per row business logic

✓Log based CDC scales to 50000+ events per second with under 5 percent CPU overhead versus trigger overhead of 10 to 30 percent, making it necessary for high throughput systems

✓Triggers enable capture time filtering and enrichment (masking PII, filtering test accounts) that log based CDC cannot perform without downstream processing

✓Managed database services often restrict log access, making triggers the only viable option despite higher overhead and latency penalties

✓Decision framework: state your write throughput assumption, log access availability, and need for source side logic when discussing CDC trade offs in interviews

📌 Interview Tips

1A SaaS CRM at 1800 writes per second uses trigger based CDC because their managed PostgreSQL RDS instance does not expose logical replication slots. The 25 percent CPU overhead and 35 ms p99 latency are acceptable for their workload

2A social media analytics platform processing 45000 events per second uses Debezium with Kafka Connect to read MySQL binary logs. Trigger based CDC would add 30 percent overhead (crushing the primary) versus log reading at 4 percent overhead

← Back to Trigger-based CDC Patterns Overview