Rate LimitingLeaky Bucket AlgorithmEasy⏱️ ~2 min

What is the Leaky Bucket Algorithm?

Definition
The Leaky Bucket is a rate limiting algorithm that smooths bursty traffic into a steady flow. Requests enter a queue (the bucket), which processes them at a constant rate (the leak). When the bucket is full, new requests are rejected.

The Kitchen Sink Analogy

Imagine your kitchen sink. Water comes in through the faucet (requests arrive), collects in the basin (the bucket), and drains at a fixed rate of 1 liter/sec. Pour in 3 liters in one second and 2 liters remain waiting to drain. Keep pouring faster than 1 liter/sec and eventually the basin overflows (requests get rejected).

The Key Insight: Traffic Smoothing

No matter how chaotically water arrives, it always leaves through the drain at the same steady rate. Leaky Bucket transforms spiky input (1,000 requests in 100ms, then nothing) into smooth output (100 requests per 100ms, steady). Downstream systems see constant flow regardless of upstream chaos.

Token Bucket vs Leaky Bucket

Token Bucket asks: "Do you have permission to proceed right now?" and lets bursts through immediately. A user with 500 tokens sends 500 requests in 10ms. Leaky Bucket asks: "Can I smooth this?" and queues for steady release. Those 500 requests at 100/sec leak rate take 5 seconds. Token Bucket prioritizes responsiveness; Leaky Bucket prioritizes downstream protection.

When to Reach for Leaky Bucket

Use it when downstream has nonlinear degradation. Databases often exhibit this: at 1,000 QPS latency is 10ms, at 1,500 QPS jumps to 100ms, at 2,000 QPS queries pile up and recovery takes 5 to 10 minutes. A 2x spike for 1 second can trigger a 10 minute outage. Leaky Bucket prevents that spike from reaching the database.

The Trade-off: Latency for Stability

Requests entering the queue experience delay. With a 100 item queue draining at 100/sec, a request at the back waits 1 second before processing. For user facing APIs, this may be unacceptable. For background jobs or database writes, the latency is worthwhile for stability.

💡 Key Takeaways
Leaky Bucket enforces constant rate r (requests per second) by either queueing excess traffic (shaper) or dropping it immediately (policer), unlike Token Bucket which allows controlled bursts
Maximum queueing delay is bounded by B/r where B is bucket capacity; example: r=100 rps and B=500 gives max 5 second delay before drops occur
Uber's production limiter at 500 QPS spaces requests 2ms apart with 1,000 request buffer, keeping worst case queueing under 2 seconds to prevent downstream overload
Choose Leaky Bucket when you must protect fragile downstream services from bursts (databases, caches, inference services); choose Token Bucket when short bursts improve utilization
Critical constraint: Set B ≤ r × (minimum timeout) to avoid requests timing out while queued; if client timeout is 3 seconds and r=200 rps, then B must be ≤ 600
📌 Interview Tips
1NGINX edge configuration: 5 requests per second per IP with B=25 buffer gives max 5 second added delay (B/r = 25/5) before shedding bot traffic while keeping legitimate user p95 latency stable
2Stripe Redis GCRA (Generic Cell Rate Algorithm) implementation: enforces 10 to 1000 requests per second per API key with sub millisecond latency per check, handling hundreds of thousands of rate decisions per second on a single Redis shard
← Back to Leaky Bucket Algorithm Overview
What is the Leaky Bucket Algorithm? | Leaky Bucket Algorithm - System Overflow