Rate LimitingDistributed Rate LimitingMedium⏱️ ~2 min

Rate Limiting Algorithms: Fixed Window, Sliding Window, and Token Bucket

Algorithm Choice in Distributed Context

The algorithms from single server rate limiting all work in distributed settings, but their coordination costs differ significantly. Choosing the wrong algorithm can add 50% latency overhead or allow 2x your intended rate.

Fixed Window: Cheapest, Least Accurate

One Redis key per user per window: INCR user:123:minute:202401151430 with TTL. One atomic operation, 0.5 ms round trip. Problem: boundary bursts. User sends 100 at 14:29:59 and 100 at 14:30:01, getting 200 in 2 seconds. Good for: daily quotas, billing limits, where exact window boundaries are acceptable.

Sliding Window Counter: Balanced Trade-off

Keep two fixed window counters (current and previous) and compute weighted average. Two Redis reads per request (can be pipelined). Formula: count = current + (previous × overlap_percentage). Cost: 1 ms. Eliminates boundary burst problem. Good for: most API rate limiting where fairness matters.

Token Bucket: Best for Burst Control

Store last refill time and current token count. Each request calculates tokens refilled since last check, subtracts cost, updates atomically via Lua script. Cost: 0.5 to 1 ms. Provides fine grained burst control. Good for: protecting fragile backends, API gateway rate limiting, burst capacity control.

Sliding Log: Most Accurate, Most Expensive

Store every request timestamp in a Redis sorted set. Each request: ZREMRANGEBYSCORE, ZCARD, ZADD. Cost: 1 to 3 ms plus O(n) memory. Perfect accuracy but high cost. Good for: billing, compliance, low rate limits where accuracy justifies expense.

💡 Key Takeaways
Fixed window allows up to twice the limit at boundaries: 100 requests at 07:09:59 plus 100 at 07:10:00 equals 200 requests in two seconds for a 100 per minute limit
Approximated sliding window with two buckets achieved 0.003% error rate across 400 million requests at Internshala while storing only two integers per key, providing near accurate enforcement with minimal memory
Token bucket enables burst capacity up to bucket size while enforcing long term average rate, making it ideal for APIs that need to allow occasional traffic spikes without violating sustained throughput limits
Criteo evaluated algorithms across six data centers under worst 1% load and recommended token bucket for strong atomicity scenarios and approximated sliding window when only atomic counters are available
📌 Interview Tips
1Token bucket with capacity 100 and refill rate 10 per second allows a client to immediately consume 100 tokens for a burst, then sustain 10 requests per second afterward. Refill calculation: tokens_to_add = (current_time minus last_refill_time) multiplied by rate, clamped to capacity.
2Approximated sliding window at 07:10:30 with 60 second buckets: current bucket (07:10:00 to 07:11:00) has 45 requests, previous bucket (07:09:00 to 07:10:00) has 80 requests. Overlap fraction is 0.5 (30 seconds into current). Effective count equals 45 plus 80 times 0.5 equals 85 requests.
← Back to Distributed Rate Limiting Overview