Token Bucket: Burst Control for Rate Limiting
Windows vs Buckets: Different Mental Models
Window based rate limiting asks: "How many requests in the last N seconds?" Token and leaky bucket asks: "Do you have capacity right now?" Both achieve rate limiting but with different trade-offs. Understanding when to use each model is crucial for designing effective rate limiting systems that match your actual requirements.
Window Approach: Count Based
Track a count over time. Simple mental model: "max 100 requests per minute" is easy to explain to users and easy to verify. Implementation is straightforward: increment counter, compare to limit, reset periodically. Good for user facing rate limits where clarity matters. Downside: bursts at window boundaries, and the window length determines granularity. A 1 minute window cannot distinguish between 100 requests spread evenly versus 100 requests in 1 second.
Bucket Approach: Capacity Based
Track available capacity that refills over time. Token bucket starts full and depletes. Leaky bucket fills and drains. Both provide burst control that windows lack. With token bucket (r = 100/sec, b = 500), a user can burst 500 requests immediately but then must wait for refill. This is more nuanced than "100 per minute" and harder to explain to users, but provides better protection for downstream systems that are burst sensitive.
Combining Both Approaches
Real systems often layer multiple algorithms. Example: per second token bucket (r = 10, b = 20) for burst control plus daily fixed window counter (1,000/day) for quota management. The token bucket smooths traffic second by second, while the daily window ensures fair allocation over longer periods. Each layer provides different protection.
Decision Framework
Use windows when: user facing limits need clear explanation ("100 requests per hour"), billing or quota tracking requires exact window counts, or compliance requires verifiable time bounded limits. Use buckets when: downstream needs burst protection, traffic smoothing is more important than exact counts, or you need finer grained control over burst size versus sustained rate. Often, the answer is both: buckets for traffic shaping, windows for accounting.