Rate LimitingLeaky Bucket AlgorithmMedium⏱️ ~3 min

Sizing Parameters: Leak Rate r and Bucket Capacity B

Choosing the right leak rate r and bucket capacity B determines whether your Leaky Bucket protects downstream systems or creates new problems. The leak rate r (units per second) drives steady throughput to your backend. Set r to 70 to 90 percent of true sustainable capacity to absorb variance without saturating. For example, if your database can handle 1,000 queries per second (QPS) at p99 latency under 50 milliseconds, set r=700 to 800 rps to leave headroom for background tasks, maintenance operations, and measurement error. Bucket capacity B bounds both queue length and maximum added latency through the formula max delay = B/r. This is your most critical constraint. B must satisfy B ≤ r × (minimum timeout across your request path). If clients timeout after 2 seconds and you target r=500 rps, then B cannot exceed 1,000 or requests will timeout while waiting in the queue. When B/r exceeds downstream timeouts, queued work fails and wastes capacity while amplifying retries, creating a worse outcome than simply dropping requests immediately. NGINX edge deployments typically use 1 to 10 requests per second per IP address for untrusted traffic with small burst buffers of 10 to 50 requests. With r=5 rps and B=25, maximum added delay before shedding is B/r = 5 seconds. This protects origin services from bot surges while keeping legitimate user p95 latency stable. For multi tenant systems, use per principal queues with weighted fair scheduling to prevent one tenant from monopolizing the leak and starving others.
💡 Key Takeaways
Set leak rate r to 70 to 90 percent of sustainable backend capacity to absorb variance; example: 1,000 QPS database capacity means r=700 to 800 rps leaves headroom for spikes and background load
Bucket capacity B must satisfy B ≤ r × (minimum timeout) or requests timeout while queued; r=500 rps with 2 second timeout requires B ≤ 1,000
Maximum queueing delay is B/r; NGINX production configs use r=5 rps per IP with B=25 giving 5 second max delay to shed bot traffic while protecting origin p95 latency
Multi tenant systems need per principal buckets with weighted fair scheduling to prevent one tenant monopolizing the leak and starving others of their allocated throughput
Measure under realistic key distributions: hot keys serialize updates and reduce throughput; if each limiter node handles 200k checks per second, a 5 node shard sustains roughly 1M checks per second with redundancy
📌 Examples
Uber client library pacing 500 QPS means 2 millisecond spacing between requests; 1,000 in flight buffer caps worst case queueing at 2 seconds, preventing downstream tail latency spikes during retry storms
Unsafe configuration: r=200 rps, B=2,000, timeout=3 seconds gives max delay of 10 seconds. Most queued requests fail, wasting capacity and amplifying retries. Correct sizing: B ≤ 600
← Back to Leaky Bucket Algorithm Overview
Sizing Parameters: Leak Rate r and Bucket Capacity B | Leaky Bucket Algorithm - System Overflow