Message Queues & Streaming • Dead Letter Queues & Error HandlingEasy⏱️ ~3 min
What Are Dead Letter Queues and Why Do They Matter?
Definition
A Dead Letter Queue (DLQ) is a separate holding area where messages go when they fail processing repeatedly—like a hospital triage area that isolates problem cases so they do not block the main flow.
💡 Key Takeaways
✓DLQs isolate poison messages to protect throughput and latency for healthy traffic, preventing a single bad message from blocking thousands of good ones
✓Production systems typically see DLQ rates below 0.1 to 1 percent of total message volume, with alerts firing above this threshold
✓Each DLQ message carries forensic metadata including attempt count, error classification, timestamps, consumer version, and correlation identifiers for root cause analysis
✓Amazon services commonly retry 3 to 10 times with exponential backoff (100 ms to 30 or 60 seconds with jitter) before dead lettering
✓At least once delivery semantics mean redrives produce duplicates, requiring idempotent consumers with deduplication keys at business operation level
✓Microsoft Azure customers monitor oldest DLQ message age, paging when it exceeds five times the normal end to end SLO
📌 Interview Tips
1Amazon Prime Day traffic bursts increase message volume 10 to 50 times baseline; teams widen backoff caps from 30 seconds to 2 or 5 minutes to avoid DLQ floods from downstream saturation
2Google Pub/Sub globally distributed topics handle millions of messages per minute with per subscription DLQs, redriving at rate limits of 100 to 1,000 messages per second to avoid overwhelming dependencies
3A payment processing system uses order identifiers as idempotency keys; when redriving from DLQ after fixing schema validation, duplicate messages are deduplicated to prevent double charges