Rate Limiting • Fixed vs Sliding WindowMedium⏱️ ~3 min
Token Bucket: Burst Control for Rate Limiting
Token bucket is an alternative to windowed counters that models rate limiting as a bucket with a maximum capacity that refills at a steady rate. Each request consumes one or more tokens; if tokens are available, the request proceeds and tokens are deducted. If the bucket is empty, the request is denied or delayed. The bucket refills continuously at the configured rate (for example, 100 tokens per second), and any excess tokens beyond capacity are discarded.
This approach naturally handles both steady state rate and burst tolerance. Set capacity to allow a reasonable burst (say, 500 tokens) and refill rate to your desired sustained throughput (100 tokens per second). A client can burst up to 500 requests instantly if the bucket is full, then sustain 100 requests per second indefinitely. This is far more user friendly than fixed windows, which either allow boundary gaming or harshly cut off bursts. Amazon API Gateway uses token bucket throttling with published steady state limits in the thousands of requests per second and burst capacities in the low thousands.
Implementation requires tracking two values per key: current token count and last refill timestamp. On each request, compute elapsed time since last refill, add (elapsed times refill_rate) tokens capped at capacity, update timestamp, then check and decrement tokens. Updates are O(1) and state is minimal (16 bytes per key for count and timestamp). Unlike sliding windows, there is no need for buckets or circular buffers.
The tradeoff is that token bucket does not provide hard guarantees about request counts within any specific time window. A client might send 500 requests in one second (draining the burst), wait 5 seconds (bucket refills 500 tokens), then burst another 500. This is 1,000 requests in 6 seconds, averaging 166 per second even though the sustained rate is 100 per second. If you need strict window based accounting (for billing or quotas), token bucket is not suitable. For user facing Application Programming Interface (API) throttling where you want to allow reasonable bursts and smooth traffic, token bucket is often the best choice.
💡 Key Takeaways
•Combines steady rate and burst capacity in one mechanism. Amazon API Gateway typical limits: thousands of requests per second sustained, low thousands burst capacity, far more flexible than fixed windows.
•State per key is just 16 bytes (8 byte token count, 8 byte timestamp). With 10 million keys, only 160 MB versus gigabytes for sliding log. Update operations are O(1).
•Does not enforce strict per window quotas. A client with 500 token capacity and 100 per second refill can send 500 requests, wait 5 seconds, send 500 more (1,000 in 6 seconds, 166 per second average).
•Better user experience than hard window cutoffs. Clients can absorb transient spikes without hitting limits, reducing false positive throttling and retry storms.
•Distributed enforcement can use local buckets with periodic synchronization. Allow each node a fraction of global capacity, aggregate every 100 to 500 milliseconds, accepting bounded staleness for lower latency.
•Thundering herd risk if all clients synchronize refills. Add client side jitter to retry logic and stagger bucket refill schedules across keys to smooth load.
📌 Examples
API Gateway configuration: capacity = 2,000 tokens, refill = 1,000 tokens per second. Client sends 2,000 requests at t=0 (bucket empty). At t=2s, bucket has 2,000 tokens again. Client can burst another 2,000 requests, sustaining 1,000 req/sec average.
Implementation in code: tokens = min(capacity, tokens + (now - last_update) * refill_rate). If tokens >= cost: tokens -= cost, allow request. Else: deny. Finally: last_update = now. All O(1) operations.
Reddit scale: 500,000 OAuth clients, 60 requests per minute sustained (1 per second), burst 10. State = 500,000 × 16 bytes = 8 MB for tokens and timestamps. Far smaller than sliding window counters at same scale.