Rate LimitingLeaky Bucket AlgorithmHard⏱️ ~3 min

Failure Modes and Edge Cases in Production

How Leaky Buckets Fail

The algorithm is simple but production deployments reveal subtle failure modes. Understanding these prevents outages. Most failures stem from edge cases the basic algorithm does not address: head of line blocking, queue memory exhaustion, and coordination races in distributed deployments.

Head of Line Blocking

A slow request at the front of the queue delays everything behind it. If one request takes 5 seconds to process, 100 requests behind it wait 5+ seconds before even starting. Unlike token bucket which admits immediately, leaky bucket queues create coupling between unrelated requests. Mitigation: set per request timeouts independent of queue wait. If processing takes more than 2 seconds, fail fast and move on. Or use multiple parallel queues so one slow request only blocks its lane.

Queue Memory Exhaustion

If requests queue faster than they drain and capacity B is high, memory fills with waiting requests. A 10 KB request body times 100,000 queued requests equals 1 GB memory. If downstream goes completely down, queue fills in seconds. Mitigation: bound queue by both count and memory. Drop requests when either limit is hit. Monitor queue depth as a health metric; sustained growth indicates capacity mismatch.

Stale Queue Requests

A request queued for 30 seconds might no longer be relevant. The user gave up, the client timed out, or the context expired. Processing it wastes resources and might cause errors. Mitigation: attach a deadline to each request when enqueued. Before processing, check if deadline passed. If expired, discard without processing. Track expired request rate as an indicator of capacity problems.

Clock Skew in Distributed Leaking

If servers leak based on their local clocks and clocks differ by 100ms, servers drain at different rates. One server processes 10% more requests, throwing off global rate. Mitigation: use a central time source (Redis server time) for leak calculations, or use monotonic elapsed time rather than wall clock time. NTP keeps clocks within 1 to 10 ms usually, but edge cases during leap seconds or network partitions can cause larger skew.

💡 Key Takeaways
Head of line blocking when B/r exceeds timeouts wastes capacity: r=200 rps, B=2,000, timeout=3 seconds gives 10 second max delay where most requests fail queued. Fix: B ≤ 600
Garbage Collection (GC) pauses and timer jitter accumulate backlog that releases as micro bursts; drain at target rate over accumulated time rather than catching up instantly to maintain smoothness
Thundering herd from synchronized 429 retries amplifies load; include retry after headers in rejections, use exponential backoff with jitter, and client side leaky buckets to pace retry attempts
Priority inversion in shared buckets lets low priority batch jobs starve urgent requests; use weighted per class buckets with strict priority scheduling to protect critical paths
Multi stage composition in production: token bucket in front allows benign micro bursts (batch flushes), leaky bucket behind enforces smooth downstream protection with bounded delay
📌 Interview Tips
1Unbounded memory growth: shapers with large B across many keys exhaust memory; bound active bucket set with Least Recently Used (LRU) eviction and TTLs, choosing B based on strict worst case latency budgets not theoretical maximums
2Client side leaky bucket pacing retries and log uploads at 100 rps: space sends by 10 milliseconds to avoid synchronized bursts when many clients wake simultaneously after incident resolution
← Back to Leaky Bucket Algorithm Overview