Failure Modes: Race Conditions, Clock Issues, and Thundering Herds

Why Token Buckets Fail
The algorithm is simple, but implementation details create subtle bugs that only surface under production load. These failures can let through 2 to 10x your intended rate or artificially deny legitimate traffic. Here are the three most common ways token buckets break in production.
Race Conditions
Two requests check the bucket simultaneously, both see 10 tokens, both proceed, both subtract, and now you have 8 tokens but admitted 2 requests from 10. Under high concurrency, this "double spending" can let through 2 to 10x your intended rate. The fix: use atomic operations. In code, use compare and swap or atomic counters. In Redis, wrap check and decrement in a Lua script so it executes as one indivisible operation. Never do "read tokens then decide then write tokens" as separate steps. The Lua script pattern: local tokens = redis.call('GET', key); if tokens > 0 then redis.call('DECR', key); return 1 end; return 0
Clock Jumps
Token buckets refill based on elapsed time. If you use wall clock time and system clock jumps forward by 1 second (NTP correction), you just minted 1,000 free tokens instantly (with r equals 1000/sec). Clock jumps backward freeze refills, creating artificial denials. The fix: always use monotonic clocks. They only move forward at steady rate regardless of NTP adjustments. In code, that means CLOCK_MONOTONIC on Linux or System.nanoTime() in Java, not current wall clock time.
Thundering Herd at Refill
If you refill tokens in discrete ticks (dump 1,000 tokens every second at :00), all waiting requests fire simultaneously when tick hits. Downstream sees 1,000 request spike every second instead of smooth 1,000/sec flow. The fix: use continuous refill. On each request, calculate tokens = current + r times (time since last check). This spreads arrivals naturally. If you must use ticks, add random jitter so not all buckets refill at exactly the same moment.
Burst Size Mistakes
Setting burst capacity too high feels generous but creates problems. If b equals 10,000 with r equals 1,000/sec, a user can dump 10 seconds worth instantly, overwhelming database connection pool. Rule of thumb: set b to 0.5 to 1.0 seconds worth of r unless you have verified downstream can absorb larger spikes.

💡 Key Takeaways

✓Race conditions let through 2 to 10x intended rate; fix with atomic operations and Lua scripts wrapping check and decrement

✓Clock jumps from NTP can mint free tokens (forward) or freeze refills (backward); always use monotonic clocks like CLOCK_MONOTONIC

✓Discrete tick refill causes thundering herd; use continuous refill calculating tokens = current + r times elapsed

✓Burst capacity too high (b=10K with r=1K/sec) overwhelms downstream; set b to 0.5 to 1.0 seconds worth of r

✓Add random jitter to tick based systems so not all buckets refill at same moment

📌 Interview Tips

1Walk through race condition: two requests see 10 tokens, both proceed, both subtract. Result: 8 tokens but 2 admitted. Lua script atomicity prevents this.

2Explain clock jump: NTP corrects +1 second, bucket mints r tokens instantly. With r=1000/sec, that is 1000 free tokens. Monotonic clocks prevent this.

← Back to Token Bucket Algorithm Overview