Sizing and Tuning: Choosing r and b with Real Numbers
Working Backward from Downstream Capacity
Never choose token bucket parameters based on intuition. Start with what your system can actually handle. If your database sustains 8,000 queries/sec comfortably and tolerates 1 second bursts of 2,000 extra requests, your token bucket should reflect that: r = 8,000/sec, b = 2,000. The formula max requests = r × T + b tells you exactly what happens: a user idle for 10 seconds can immediately burst 2,000 requests, then sustain 8,000/sec.
The Burst Sizing Rule
Production experience converges on a consistent guideline: set b between 0.5 and 1.0 seconds worth of r. With r = 1,000/sec, that means b = 500 to 1,000. This allows natural request clustering without enabling dangerous spikes. Going beyond 2 seconds worth of r rarely helps legitimate users but increases downstream overload risk.
Weighted Tokens for Variable Request Costs
Not all requests impose equal cost. A complex analytics query might stress your database 5× more than a simple lookup. Assign weights: light query costs 1 token, heavy query costs 5 tokens. With r = 1,000 tokens/sec, you can sustain 1,000 light queries OR 200 heavy queries per second. The bucket naturally throttles expensive operations harder while being generous with cheap ones.
Multi Instance Budget Division
When running multiple frontend servers, decide how to divide the global limit. With 4 servers and global r = 8,000, allocate r = 2,000 per server. For exact global limits with uneven traffic, use distributed buckets with leases: each server requests token chunks (200 at a time) from a central store, reducing coordination by 200×.
Production Tuning Process
Start conservative with lower r and smaller b, then monitor rejection rates. If you see frequent 429s during normal usage, increase b for burst tolerance. If downstream shows strain, decrease b. Healthy systems reject under 1% of requests under normal load. Monitor token levels: persistently near zero means demand exceeds capacity. Alert on sustained deny rates above 1 to 5%.