Rate Limiting • Fixed vs Sliding WindowHard⏱️ ~3 min
Failure Modes: Hot Keys, Clock Skew, and State Explosions
Rate limiting and windowing systems fail in predictable ways under adversarial load and operational stress. Hot key cardinality explosions occur when an attacker generates millions of unique keys (rotating IP addresses, distributed botnets, credential stuffing with millions of usernames). Sliding window counters that allocate 60 buckets per key can consume gigabytes: 20 million IPs times 60 buckets times 2 bytes equals 2.4 GB raw, often 5 GB with overhead. Without approximation (exponential decay, count min sketch), memory exhausts and enforcement fails open, allowing attacks through.
Clock skew and misaligned boundaries cause distributed enforcers to diverge. If two nodes disagree by 100 milliseconds on bucket boundaries, one node may count a request in bucket N while the other counts it in bucket N plus 1. Aggregated globally, this causes transient under counts or over counts, allowing some clients to exceed limits or incorrectly throttling others. Tumbling windows with synchronized resets create thundering herds: all clients see their quotas refill simultaneously and send requests at the same instant, creating step function load that overwhelms backends. Add client side jitter (random 0 to 10 percent delay) and server side staggered resets.
Late and out of order events in stream processing windows require watermarks and allowed lateness. If you set a 5 minute tumbling window with 30 second allowed lateness, events arriving more than 30 seconds late are dropped, causing undercounts. Events within the lateness window trigger retractions and updates to already emitted aggregates, requiring downstream dashboards and alerting to handle mutable results. Without idempotent sinks, this causes duplicate alerts or incorrect billing.
Partition rebalancing and node failures without durable state cause counter resets. A client halfway through their quota experiences a rebalance; their counter resets to zero, allowing them to consume another full quota immediately. For strict correctness, persist state to distributed storage (DynamoDB, Cassandra) or use state transfer protocols during rebalances. This adds latency and complexity but is necessary for financial or compliance use cases.
💡 Key Takeaways
•Hot key cardinality: 20 million unique IPs times 60 buckets times 2 bytes equals 2.4 GB raw counters. With hash table overhead, 5 to 6 GB. Botnet attacks can exhaust memory, causing fail open and letting attacks through.
•Clock skew of 100 milliseconds across distributed nodes causes bucket misalignment. Request at boundary counted in different buckets by different enforcers, leading to aggregate errors of 10 to 20 percent in practice.
•Boundary spike in fixed windows: client sends N requests at t equals 59.9 seconds and N at t equals 60.1 seconds. Server sees 2N requests in 200 milliseconds but each window shows N, within limit. Can double effective rate briefly.
•Late events in stream windows: 5 minute tumbling window, 30 second allowed lateness. Events more than 30 seconds late are dropped (undercounting). Events within lateness trigger retractions, requiring idempotent downstream consumers.
•State loss on rebalance: client consumes 50 percent of quota, node crashes, counter lost. Client resets to 0, consumes another full quota. Without durable state, enforcement is best effort only.
•Thundering herd on refill: token buckets or fixed windows with synchronized resets cause all clients to send requests simultaneously. Backend sees step function load. Mitigate with jittered client retries and staggered server side refills.
📌 Examples
AWS WAF scaling challenge: Track 50 million unique IP addresses over 5 minute sliding windows. Even with 10 second buckets (30 buckets), 50,000,000 × 30 × 2 bytes = 3 GB raw counters per edge location. Must use approximate counters or offload to distributed storage.
Clock skew scenario: Node A and Node B enforce per user limits. Node A clock is 120 milliseconds ahead. Request at true time 59.95 seconds: Node A assigns to window 1 (t equals 60.07 on its clock), Node B assigns to window 0. Aggregator sees inconsistent counts, may allow or deny incorrectly.
Credential stuffing attack: Attacker tries 10 million username password pairs. Sliding log stores 10 failed attempts per username over 10 minutes. 10,000,000 × 10 × 8 bytes = 800 MB just for rate limit state. If attackers use 100 million usernames, 8 GB, plus overhead pushes memory limits.