Failure Modes and Edge Cases in Production
How Leaky Buckets Fail
The algorithm is simple but production deployments reveal subtle failure modes. Understanding these prevents outages. Most failures stem from edge cases the basic algorithm does not address: head of line blocking, queue memory exhaustion, and coordination races in distributed deployments.
Head of Line Blocking
A slow request at the front of the queue delays everything behind it. If one request takes 5 seconds to process, 100 requests behind it wait 5+ seconds before even starting. Unlike token bucket which admits immediately, leaky bucket queues create coupling between unrelated requests. Mitigation: set per request timeouts independent of queue wait. If processing takes more than 2 seconds, fail fast and move on. Or use multiple parallel queues so one slow request only blocks its lane.
Queue Memory Exhaustion
If requests queue faster than they drain and capacity B is high, memory fills with waiting requests. A 10 KB request body times 100,000 queued requests equals 1 GB memory. If downstream goes completely down, queue fills in seconds. Mitigation: bound queue by both count and memory. Drop requests when either limit is hit. Monitor queue depth as a health metric; sustained growth indicates capacity mismatch.
Stale Queue Requests
A request queued for 30 seconds might no longer be relevant. The user gave up, the client timed out, or the context expired. Processing it wastes resources and might cause errors. Mitigation: attach a deadline to each request when enqueued. Before processing, check if deadline passed. If expired, discard without processing. Track expired request rate as an indicator of capacity problems.
Clock Skew in Distributed Leaking
If servers leak based on their local clocks and clocks differ by 100ms, servers drain at different rates. One server processes 10% more requests, throwing off global rate. Mitigation: use a central time source (Redis server time) for leak calculations, or use monotonic elapsed time rather than wall clock time. NTP keeps clocks within 1 to 10 ms usually, but edge cases during leap seconds or network partitions can cause larger skew.