Message Queues & Streaming • Message Queue FundamentalsMedium⏱️ ~3 min
Visibility Timeouts and Message Acknowledgments
When a consumer receives a message from a queue, the message isn't immediately deleted; instead, it becomes invisible to other consumers for a visibility timeout period, giving the consumer time to process and explicitly acknowledge. If the consumer crashes or fails to acknowledge before the timeout expires, the message automatically becomes visible again and another consumer can pick it up. This mechanism enables fault tolerance without manual intervention but introduces the risk of duplicate processing.
The visibility timeout must be calibrated to your actual processing time. Amazon SQS defaults to 30 seconds, but if your average processing takes 45 seconds, messages will reappear prematurely and be processed twice. Google Cloud Pub/Sub uses an acknowledgment deadline (default 10 seconds) with automatic extension: the client library sends lease renewals as long as processing continues. Setting the timeout too high is also problematic: if a consumer crashes while processing a message with a 10 minute timeout, that message sits locked and unprocessed for 10 minutes before retry.
In production, you should set visibility timeout to approximately 2x your P99 processing latency to account for variance. If your P99 is 5 seconds, use a 10 second timeout. For long running tasks, implement lease renewal: Azure Service Bus allows consumers to renew locks programmatically, and Pub/Sub client libraries do this automatically. Monitor your redelivery rate: if more than 5 to 10% of messages are being redelivered, your timeout is likely too short or your consumers are silently failing.
Prefetch and batching interact with visibility timeouts. If you prefetch 100 messages with a 30 second timeout and process them sequentially at 1 per second, the first 30 messages' timeouts will expire before you process them, causing redelivery. Instead, prefetch a number you can process well within the timeout, or process messages concurrently with bounded parallelism.
💡 Key Takeaways
•Set visibility timeout to 2x P99 processing latency: if P99 is 8 seconds, use 16 seconds; too short causes premature redelivery and duplicate processing, too long delays retry after consumer crashes
•Redelivery rate is a key health metric: if more than 5 to 10% of messages are redelivered, investigate whether timeouts are too short, consumers are crashing silently, or processing is slower than expected
•Prefetch must account for timeout: fetching 100 messages with 30 second timeout and processing sequentially at 2 seconds each means the first 15 messages time out before processing; reduce prefetch or process concurrently
•Lease renewal for long tasks: Azure Service Bus lock duration can be renewed programmatically; Google Pub/Sub client libraries automatically extend ack deadlines up to a configurable maximum as long as processing continues
•Poison message detection: track per message receive count; if a message is delivered more than 3 to 5 times, route to a dead letter queue for manual inspection rather than continuing to retry indefinitely
📌 Examples
Amazon SQS with Lambda: default visibility timeout is 30 seconds; if Lambda function duration is set to 5 minutes, messages will reappear and trigger duplicate Lambdas unless you increase the SQS visibility timeout to match or exceed the function timeout
Google Pub/Sub streaming pull: client sets initial ack deadline to 60 seconds; as processing continues, the client library automatically sends ModifyAckDeadline requests every 30 seconds to extend the deadline, preventing timeout for long running operations