Rate LimitingLeaky Bucket AlgorithmMedium⏱️ ~3 min

Sizing Parameters: Leak Rate r and Bucket Capacity B

The Two Parameters

Every Leaky Bucket has two parameters. The leak rate r determines how many requests leave per second (throughput). The bucket capacity B determines how many can queue before overflow (burst tolerance). Get r wrong and you overload downstream or waste capacity. Get B wrong and you reject too many requests or create unacceptable latency.

Sizing the Leak Rate (r)

The leak rate r should match what downstream can sustainably handle. If PostgreSQL handles 5,000 queries/sec at healthy latency, set r = 5,000. If your ML model processes 10 inferences/sec, set r = 10. Do not set r based on what you wish downstream could handle. Measure actual sustainable throughput. Set r to 80% of measured capacity for headroom.

Sizing the Bucket Capacity (B)

Bucket capacity B determines max queue depth. The formula max wait = B / r tells you worst case latency. With B = 500 and r = 100/sec, a request at the back waits 5 seconds. For user facing APIs, keep max wait under 1 second. For background jobs, 10 to 60 seconds might be acceptable. For ML inference, match B to batch processing time.

The Burst Absorption Trade-off

Larger B absorbs bigger bursts but increases worst case latency. If traffic spikes from 80/sec to 200/sec for 5 seconds, you need B = (200 - 100) × 5 = 500 to avoid rejection (r equals 100/sec). But worst case wait is then 5 seconds. If users cannot tolerate this, smaller B with some rejections might be better.

Production Tuning Process

Start with r at 80% of downstream capacity and B at 1 second worth of r. Monitor rejection rate (under 1%), queue depth (rarely hit B), and downstream latency (stay healthy). If rejection is high during legitimate spikes, increase B. If queue is always near B, increase r or add capacity.

💡 Key Takeaways
Set leak rate r to 70 to 90 percent of sustainable backend capacity to absorb variance; example: 1,000 QPS database capacity means r=700 to 800 rps leaves headroom for spikes and background load
Bucket capacity B must satisfy B ≤ r × (minimum timeout) or requests timeout while queued; r=500 rps with 2 second timeout requires B ≤ 1,000
Maximum queueing delay is B/r; NGINX production configs use r=5 rps per IP with B=25 giving 5 second max delay to shed bot traffic while protecting origin p95 latency
Multi tenant systems need per principal buckets with weighted fair scheduling to prevent one tenant monopolizing the leak and starving others of their allocated throughput
Measure under realistic key distributions: hot keys serialize updates and reduce throughput; if each limiter node handles 200k checks per second, a 5 node shard sustains roughly 1M checks per second with redundancy
📌 Interview Tips
1Uber client library pacing 500 QPS means 2 millisecond spacing between requests; 1,000 in flight buffer caps worst case queueing at 2 seconds, preventing downstream tail latency spikes during retry storms
2Unsafe configuration: r=200 rps, B=2,000, timeout=3 seconds gives max delay of 10 seconds. Most queued requests fail, wasting capacity and amplifying retries. Correct sizing: B ≤ 600
← Back to Leaky Bucket Algorithm Overview
Sizing Parameters: Leak Rate r and Bucket Capacity B | Leaky Bucket Algorithm - System Overflow