Message Queues & StreamingMessage Queue FundamentalsHard⏱️ ~3 min

Transactional Outbox Pattern for Reliable Publishing

The dual write problem occurs when you need to atomically update a database and publish a message: if you write to the database then publish, a crash after the commit but before publishing loses the message; if you publish then write, a crash loses the database update but the message is already sent. The transactional outbox pattern solves this by writing both the domain state change and a "to be published" event record in a single database transaction, then using a separate relay process to read from the outbox table and publish to the queue. The implementation has three components. First, your application transaction writes to both your domain tables (orders, accounts) and an outbox table with columns like id, aggregate_id, event_type, payload, created_at, published_at. Second, a relay process (separate service or background worker) polls the outbox table for unpublished rows (WHERE published_at IS NULL ORDER BY created_at), publishes each to the message queue, then marks published_at. Third, the relay handles idempotency: if it crashes after publishing but before marking, it will retry; the queue's deduplication or the consumer's idempotency must handle this. This pattern provides guaranteed at least once delivery: as long as the database transaction commits, the event will eventually be published, even if the relay crashes repeatedly. The trade off is added latency (events are published milliseconds to seconds after commit, depending on relay polling interval) and operational complexity (the relay is a new component to monitor and scale). Companies like Uber and DoorDash use variants of this pattern extensively for event driven architectures where consistency between local state and published events is critical. An alternative is change data capture (CDC) where a tool like Debezium tails the database transaction log and publishes changes as events. This eliminates the outbox table and application code changes but couples your event stream to database internals, requires running and operating the CDC pipeline, and makes schema evolution harder since events are shaped by table structure rather than domain logic.
💡 Key Takeaways
Solves dual write atomicity: the database transaction commits both domain state and outbox record together; even if the relay crashes, the event will eventually be published since it's durably stored in the outbox table
Relay polling interval determines publishing latency: polling every 100 milliseconds adds ~100 ms average delay; polling every 5 seconds adds ~2.5 seconds; tune based on consistency versus throughput requirements
Outbox table becomes write amplification: every business transaction writes an extra row; at 10,000 transactions per second you're writing 10,000 outbox rows per second; plan for outbox table growth and archival strategy
Idempotency still required downstream: if the relay publishes but crashes before marking published_at, it will republish on next poll; consumers must deduplicate using event ID or business key
Operational complexity trade off: adds a relay component to deploy, scale, and monitor; relay failures block event publishing and create backlog in the outbox table; consider using change data capture tools for simpler infrastructure at cost of coupling to database internals
📌 Examples
Uber's event driven architecture: services write domain changes and events to local Postgres outbox tables; dedicated relay workers poll outboxes every 500 ms and publish to Kafka, ensuring riders and drivers see consistent state even during network partitions
DoorDash order service: when an order is placed, the service writes to the orders table and an outbox table in one transaction; a separate publisher service queries the outbox, sends events to SQS, and marks them published; this guarantees downstream notification and fulfillment services never miss orders
← Back to Message Queue Fundamentals Overview
Transactional Outbox Pattern for Reliable Publishing | Message Queue Fundamentals - System Overflow