Sizing Parameters: Leak Rate r and Bucket Capacity B
The Two Parameters
Every Leaky Bucket has two parameters. The leak rate r determines how many requests leave per second (throughput). The bucket capacity B determines how many can queue before overflow (burst tolerance). Get r wrong and you overload downstream or waste capacity. Get B wrong and you reject too many requests or create unacceptable latency.
Sizing the Leak Rate (r)
The leak rate r should match what downstream can sustainably handle. If PostgreSQL handles 5,000 queries/sec at healthy latency, set r = 5,000. If your ML model processes 10 inferences/sec, set r = 10. Do not set r based on what you wish downstream could handle. Measure actual sustainable throughput. Set r to 80% of measured capacity for headroom.
Sizing the Bucket Capacity (B)
Bucket capacity B determines max queue depth. The formula max wait = B / r tells you worst case latency. With B = 500 and r = 100/sec, a request at the back waits 5 seconds. For user facing APIs, keep max wait under 1 second. For background jobs, 10 to 60 seconds might be acceptable. For ML inference, match B to batch processing time.
The Burst Absorption Trade-off
Larger B absorbs bigger bursts but increases worst case latency. If traffic spikes from 80/sec to 200/sec for 5 seconds, you need B = (200 - 100) × 5 = 500 to avoid rejection (r equals 100/sec). But worst case wait is then 5 seconds. If users cannot tolerate this, smaller B with some rejections might be better.
Production Tuning Process
Start with r at 80% of downstream capacity and B at 1 second worth of r. Monitor rejection rate (under 1%), queue depth (rarely hit B), and downstream latency (stay healthy). If rejection is high during legitimate spikes, increase B. If queue is always near B, increase r or add capacity.