Rate Limiting • Leaky Bucket AlgorithmHard⏱️ ~3 min
Failure Modes and Edge Cases in Production
Leaky Bucket implementations fail in subtle ways under production load. Head of line blocking occurs when B/r exceeds request or downstream timeouts. With r=200 rps and B=2,000, maximum added delay is 10 seconds. If upstream timeouts are 3 seconds, most queued requests fail while waiting, wasting capacity and triggering synchronized retries that amplify load. The fix is strict: set B ≤ r × (minimum timeout across path). Measure minimum timeout including network transit, proxy hops, and backend processing to get this right.
Timer jitter and Garbage Collection (GC) pauses in shapers accumulate backlog during pauses, then release micro bursts on resume that violate smoothness guarantees downstream. Use high resolution monotonic clocks and correction logic that drains at exactly the target rate rather than catching up too fast after delays. For example, if a 100 millisecond GC pause delays processing 50 requests at r=500 rps, drain them over 100 milliseconds (the accumulated time) instead of releasing all 50 instantly.
Thundering herds across instances happen when many clients receive 429 rejections simultaneously and retry with minimal jitter, creating synchronized load spikes. Policers that drop without guidance are especially vulnerable. Mitigate by including retry after headers in rejection responses (tell clients when to retry), adding per client exponential backoff with jitter, and using client side leaky buckets to pace retry attempts. Priority inversion occurs when a single shared bucket mixes critical and background traffic: low priority batch jobs can starve urgent requests. Use weighted or per class buckets with strict priority scheduling. Production systems at scale use multi stage limiters: token bucket in front allows small benign bursts for performance (like batch flushes), while leaky bucket behind enforces smooth downstream protection.
💡 Key Takeaways
•Head of line blocking when B/r exceeds timeouts wastes capacity: r=200 rps, B=2,000, timeout=3 seconds gives 10 second max delay where most requests fail queued. Fix: B ≤ 600
•Garbage Collection (GC) pauses and timer jitter accumulate backlog that releases as micro bursts; drain at target rate over accumulated time rather than catching up instantly to maintain smoothness
•Thundering herd from synchronized 429 retries amplifies load; include retry after headers in rejections, use exponential backoff with jitter, and client side leaky buckets to pace retry attempts
•Priority inversion in shared buckets lets low priority batch jobs starve urgent requests; use weighted per class buckets with strict priority scheduling to protect critical paths
•Multi stage composition in production: token bucket in front allows benign micro bursts (batch flushes), leaky bucket behind enforces smooth downstream protection with bounded delay
📌 Examples
Unbounded memory growth: shapers with large B across many keys exhaust memory; bound active bucket set with Least Recently Used (LRU) eviction and TTLs, choosing B based on strict worst case latency budgets not theoretical maximums
Client side leaky bucket pacing retries and log uploads at 100 rps: space sends by 10 milliseconds to avoid synchronized bursts when many clients wake simultaneously after incident resolution