Rate Limiting • Token Bucket AlgorithmHard⏱️ ~3 min
Failure Modes: Race Conditions, Clock Issues, and Thundering Herds
Token bucket implementations fail in subtle ways under production load, network partitions, and clock skew. Understanding these edge cases is critical for reliable enforcement.
Race conditions occur when concurrent requests read and modify bucket state without atomicity. Without atomic compare-and-swap or serialized scripts, two threads can both see tokens = 10, both decide to admit, and both decrement, resulting in tokens = 8 instead of the correct tokens = 8 after two admissions from 10. Under high concurrency, this overshoot can reach 2 to 10× target rate. Solution: use atomic counters (e.g., AtomicInteger in Java, atomic operations in C++) or wrap Redis updates in Lua scripts that execute atomically. Distributed Key-Value (KV) stores must support single atomic read-modify-write primitives.
Clock issues manifest when using wall clock time for refill calculations. Network Time Protocol (NTP) corrections can cause sudden jumps forward (minting tokens instantly) or backward (stalling refill). A 1 second NTP jump with r = 1,000/sec mints 1,000 spurious tokens. Solution: always use monotonic clocks (CLOCK_MONOTONIC on Linux, System.nanoTime in Java). In distributed settings, avoid comparing wall clock across nodes; compute refills locally based on per-node monotonic time since last check.
Thundering herd at refill boundaries happens with discrete tick based refills. If you replenish tokens every 1 second at second boundaries, all requests queued during empty periods fire simultaneously at the next second, slamming downstream. With 10,000 clients waiting and b = 10,000, downstream sees 10,000 request spike in milliseconds. Solution: use continuous refill (tokens += r·Δt every check) instead of stepwise ticks, spreading arrival naturally. Add per instance jitter if you must use ticks.
Burst mis-sizing is common: setting b too large (e.g., b = 10,000 with r = 1,000/sec, allowing 10 second bursts) can overwhelm database connection pools or cause queue blow-ups downstream. Rule of thumb: size b to at most 0.5 to 1.0 seconds of r unless downstream explicitly tolerates larger spikes. AWS API Gateway defaults follow this: burst typically 1 to 2× rate, rarely exceeding 2 seconds worth.
💡 Key Takeaways
•Race conditions without atomic operations cause 2 to 10× overshoot under concurrency. Use atomic compare-and-swap or Redis Lua scripts to ensure read-modify-write atomicity. Measure admit rate in production; if it exceeds r by >10%, suspect races.
•Wall clock based refill creates spurious tokens during Network Time Protocol (NTP) jumps. A 1 second forward jump with r = 1,000/sec mints 1,000 extra tokens instantly. Always use monotonic clocks (CLOCK_MONOTONIC, System.nanoTime) and compute Δt locally per node.
•Thundering herd from discrete tick refills causes synchronized bursts at second boundaries. With 10,000 waiting clients and b = 10,000, downstream receives 10,000 requests in milliseconds. Use continuous refill (r·Δt per check) to spread arrivals naturally.
•Over-large burst capacity overwhelms downstream. Setting b = 10,000 with r = 1,000/sec allows a 10 second spike that can exhaust database connection pools (typically 50 to 200 connections). Size b to 0.5 to 1.0 seconds of r as default.
•Hot key partitions during network splits independently mint tokens per shard, causing short term over-admission. With 4 shards each holding b = 500 local tokens, a partition can overshoot by 2,000 tokens (4× b) until healed. Monitor partition events and emit alerts.
📌 Examples
Production incident: Multi-threaded service without atomic updates hit 15,000 RPS against r = 5,000/sec target due to race conditions. Fix: switched to AtomicLong in Java; actual rate dropped to 5,100 RPS (1% overshoot from measurement noise).
NTP correction moved clock forward 2 seconds on rate limiter host with r = 10,000/sec. Instant burst of 20,000 requests overwhelmed downstream database, causing 30 second outage. Fix: migrated to CLOCK_MONOTONIC; subsequent NTP jumps had zero impact.
Discrete 1 second tick refill caused synchronized 5,000 request bursts every second at xx:00.000 timestamp, saturating load balancer connection table (50,000 max). Switched to continuous refill; arrival pattern smoothed to uniform distribution with p99 inter-arrival time under 2 milliseconds.