Rate LimitingDistributed Rate LimitingMedium⏱️ ~2 min

Rate Limiting Algorithms: Fixed Window, Sliding Window, and Token Bucket

Fixed window counters maintain one counter per key per time bucket, incrementing on each request and resetting at bucket boundaries. This approach is constant time and computationally cheap, requiring just one atomic increment operation. However, it suffers from boundary burst problems: a client can send 100 requests at 07:09:59 and another 100 at 07:10:00 for a 100 requests per minute limit, achieving 200 requests in two seconds, which is roughly twice the intended rate. Sliding window approaches smooth these edge effects. The exact variant keeps timestamped events and counts those within the trailing window, but at high queries per second (QPS) this creates memory pressure and expensive prune operations. The approximated sliding window keeps only two counters for current and previous buckets, calculating effective usage as current count plus previous count multiplied by the overlap fraction. Internshala implemented this with a two bucket design and analyzed 400 million requests, observing only a 0.003% error rate while storing just two numbers per key. Token bucket tracks a token count and last refill timestamp. On each request, the system computes tokens to add based on elapsed time (clamped to bucket capacity), then consumes one token if available. This allows controlled bursts up to capacity while enforcing long term average rate. Criteo, operating across six data centers, favored token bucket when strong single round trip atomicity is available in the shared store, as it provides accurate burst control with constant time performance under worst case load conditions.
💡 Key Takeaways
Fixed window allows up to twice the limit at boundaries: 100 requests at 07:09:59 plus 100 at 07:10:00 equals 200 requests in two seconds for a 100 per minute limit
Approximated sliding window with two buckets achieved 0.003% error rate across 400 million requests at Internshala while storing only two integers per key, providing near accurate enforcement with minimal memory
Token bucket enables burst capacity up to bucket size while enforcing long term average rate, making it ideal for APIs that need to allow occasional traffic spikes without violating sustained throughput limits
Criteo evaluated algorithms across six data centers under worst 1% load and recommended token bucket for strong atomicity scenarios and approximated sliding window when only atomic counters are available
📌 Examples
Token bucket with capacity 100 and refill rate 10 per second allows a client to immediately consume 100 tokens for a burst, then sustain 10 requests per second afterward. Refill calculation: tokens_to_add = (current_time minus last_refill_time) multiplied by rate, clamped to capacity.
Approximated sliding window at 07:10:30 with 60 second buckets: current bucket (07:10:00 to 07:11:00) has 45 requests, previous bucket (07:09:00 to 07:10:00) has 80 requests. Overlap fraction is 0.5 (30 seconds into current). Effective count equals 45 plus 80 times 0.5 equals 85 requests.
← Back to Distributed Rate Limiting Overview