Rate LimitingLeaky Bucket AlgorithmEasy⏱️ ~2 min

What is the Leaky Bucket Algorithm?

The Leaky Bucket algorithm is a traffic shaping pattern that transforms bursty, irregular request arrivals into a smooth, constant rate outflow. Think of it as a literal bucket with a hole at the bottom: requests arrive at variable speeds and collect in the bucket, but they drain out through the hole at a fixed rate r (requests per second). If the bucket fills to capacity B, additional incoming requests overflow and are either dropped or rejected immediately. This differs fundamentally from Token Bucket, which allows controlled bursts. Token Bucket accumulates tokens over time and lets you spend them all at once for a burst up to the bucket size. Leaky Bucket enforces near constant spacing by either queueing requests (shaper mode) or dropping early arrivals (policer mode). The maximum queueing delay is bounded by B/r. For example, with r=100 requests per second and B=500, the worst case added latency is 5 seconds before requests start being dropped. Uber open sourced a Leaky Bucket style rate limiter (go.uber.org/ratelimit) that spaces operations evenly. When a team caps outbound Remote Procedure Calls (RPCs) to 500 queries per second (QPS), each request is spaced roughly 2 milliseconds apart. With a local buffer of 1,000 in flight operations, worst case queuing delay stays under 2 seconds. This prevents thundering herd retries during brownouts by converting retry storms into smooth, manageable trickles that downstream services can absorb without tail latency spikes.
💡 Key Takeaways
Leaky Bucket enforces constant rate r (requests per second) by either queueing excess traffic (shaper) or dropping it immediately (policer), unlike Token Bucket which allows controlled bursts
Maximum queueing delay is bounded by B/r where B is bucket capacity; example: r=100 rps and B=500 gives max 5 second delay before drops occur
Uber's production limiter at 500 QPS spaces requests 2ms apart with 1,000 request buffer, keeping worst case queueing under 2 seconds to prevent downstream overload
Choose Leaky Bucket when you must protect fragile downstream services from bursts (databases, caches, inference services); choose Token Bucket when short bursts improve utilization
Critical constraint: Set B ≤ r × (minimum timeout) to avoid requests timing out while queued; if client timeout is 3 seconds and r=200 rps, then B must be ≤ 600
📌 Examples
NGINX edge configuration: 5 requests per second per IP with B=25 buffer gives max 5 second added delay (B/r = 25/5) before shedding bot traffic while keeping legitimate user p95 latency stable
Stripe Redis GCRA (Generic Cell Rate Algorithm) implementation: enforces 10 to 1000 requests per second per API key with sub millisecond latency per check, handling hundreds of thousands of rate decisions per second on a single Redis shard
← Back to Leaky Bucket Algorithm Overview
What is the Leaky Bucket Algorithm? | Leaky Bucket Algorithm - System Overflow