Failure Modes: Hotspot Keys, Clock Skew, and Race Conditions

How Distributed Rate Limiting Breaks
Centralized coordination introduces failure modes that do not exist in local rate limiting. Understanding these helps you design resilient systems. These are not edge cases: they cause real production incidents at scale.
Hot Key Problem
One popular user or viral content creates a hotspot. If user X generates 50,000 requests/sec, you are doing 50,000 Redis operations on a single key (user:X:counter). Single Redis thread handles ~100K ops/sec; one hot key consumes half your capacity. Mitigation: shard counters by key hash across multiple Redis instances, use local caching with periodic sync, or implement request coalescing for hot keys.
Clock Skew
Fixed window boundaries depend on time. Server A at 14:00:00 and Server B at 13:59:58 disagree about which window a request belongs to. User hits both and gets double the rate. NTP keeps servers within 10 ms usually, but edge cases (VM migration, network partition, leap seconds) can cause larger drift. Mitigation: use Redis server time (single source of truth), or accept that window boundaries have 100 ms fuzziness.
Race Conditions
Two requests arrive simultaneously, both read counter at 99, both increment to 100, both proceed, actual count is now 101. Under high concurrency, this "double spending" can let through 2x to 10x your intended rate. Mitigation: use atomic operations. In Redis, Lua scripts execute atomically: read, check, increment, return in one indivisible operation. Never do read then decide then write as separate commands.
Redis Failures
Redis goes down: what happens? Block all requests (strict enforcement, poor availability) or allow all requests (good availability, no rate limiting). Common compromise: fall back to local per server limits. You lose global accuracy but maintain protection. With 10 servers and 1,000/sec global limit, local fallback of 100/sec per server gives 1,000/sec total if traffic is even, up to 1,000/sec on one server if traffic is skewed. Log these incidents for post mortem.

💡 Key Takeaways

✓Hotspot keys at 100,000 requests per second can saturate single storage partitions; local token caches with 100 millisecond TTL holding 10 to 50 tokens per server reduce store pressure by 90% while capping worst case overrun to token cache size

✓Clock skew of 5 seconds causes token bucket to refill prematurely or delay unfairly; using storage system timestamps for refill instead of application server clocks eliminates this class of errors

✓Race conditions without atomic operations allow double spend: two requests reading count equals 99 both increment to 100, permitting 101 requests against a limit of 100

✓Replication lag when reading from followers causes temporary overruns; reading from primary for hot keys or accepting small error margins with eventual consistency are the two practical approaches

📌 Interview Tips

1A viral API key generates 200,000 requests per second globally. Without per key sharding, this concentrates on one storage partition. Splitting the key across 10 partitions (each handling 20,000 requests per second) and allocating 10% of the limit per partition keeps each partition under capacity.

2Redis INCR command provides atomic increment returning the new value. Pseudocode: new_count equals INCR rate_limit_key. If new_count is greater than limit, DECR rate_limit_key and return deny. This prevents double spend without requiring transactions or locking.

← Back to Distributed Rate Limiting Overview