Rate Limiting • Fixed vs Sliding WindowMedium⏱️ ~3 min
Sliding Window Counter: Trading Memory for Fairness
Sliding window counters subdivide the window into multiple smaller buckets to approximate a true rolling count without storing every individual timestamp. For a 60 second window, you might maintain 60 one second buckets in a circular buffer. As time advances, you continuously drop the oldest bucket, add a new current bucket, and maintain a running sum. This gives you a count over the most recent 60 seconds at any query time, not just at minute boundaries.
The approximation error is bounded by the bucket granularity. If your buckets are 1 second wide and a client bursts 100 requests in the last 0.5 seconds of a bucket, your window might include or exclude all 100 depending on query timing, giving up to one bucket worth of error. Finer buckets reduce error but increase memory: 60 buckets per key versus 1 counter for fixed windows. With 10 million active keys and 60 buckets at 2 bytes each, you need roughly 1.2 GB just for counters, plus metadata and hash structures pushing toward 2 to 3 GB per node.
Amazon Web Services (AWS) Web Application Firewall (WAF) uses a sliding 5 minute window per IP address with thresholds typically set at 1,000 to 2,000 requests per 5 minutes. This rolling approach prevents the boundary gaming that fixed windows allow, ensuring clients cannot burst 2,000 requests in seconds by splitting across a boundary. Detection and enforcement happen with subsecond latency at edge locations, but AWS explicitly documents that per IP state retention across millions of IPs is a scaling challenge requiring distributed aggregation.
Implementation requires careful time synchronization and bucket alignment across distributed enforcers. If nodes disagree on bucket boundaries by even 100 milliseconds, counters can diverge and either allow excess traffic or falsely throttle legitimate users. Use epoch aligned bucket starts and monotonic clocks, and consider enforcing at threshold times 0.95 to create a guard band that absorbs minor clock drift.
💡 Key Takeaways
•Memory scales with buckets times keys. For 10 million keys, 60 one second buckets at 2 bytes each equals 1.2 GB raw counters, typically 2 to 3 GB with overhead. Compare to 160 MB for fixed windows.
•Approximation error bounded by bucket width times peak rate. With 5 second buckets and 100 req/sec bursts, error can reach plus or minus 500 requests. Finer buckets cost more memory but improve accuracy.
•AWS WAF uses sliding 5 minute windows per IP with 1,000 to 2,000 request thresholds, enforcing at edge locations with subsecond latency. This prevents boundary gaming that affects fixed windows.
•Update complexity is O(1) per request: increment current bucket and running sum. Bucket rotation is amortized O(1) as you drop one bucket and zero it when time advances.
•Distributed deployment requires clock synchronization. Skew of 100 milliseconds can misalign buckets across nodes, causing divergent counts. Use epoch aligned boundaries and enforce at threshold times 0.95 as a guard band.
•Hot key cardinality explosions are a real failure mode. A botnet with 20 million unique IPs times 60 buckets times 2 bytes equals 2.4 GB just for counters, risking memory exhaustion without approximation techniques.
📌 Examples
AWS WAF configuration: window = 300 seconds (5 minutes), bucket = 5 seconds, 60 buckets per IP. Threshold = 2,000 requests. Client at IP 203.0.113.45 sends 1,950 requests spread over 5 minutes (allowed), then bursts 100 in one second. Rolling sum = 2,050, triggers mitigation.
Implementation: circular buffer of 60 counters, head pointer, running_sum. On event at t: bucket_index = (t / bucket_width) mod 60. If bucket_index != head, advance head (running_sum -= buckets[head], buckets[head] = 0, head = bucket_index). Then buckets[bucket_index] += 1, running_sum += 1.
Memory sizing for Reddit scale: assume 500,000 active API clients, 60 second window, 1 second buckets. 500,000 × 60 × 2 bytes = 60 MB counters. With hash table overhead factor of 2, total approximately 120 MB, manageable on single node.