Distributed Leaky Bucket: Coordination and Consistency

The Multi Server Problem
Leaky bucket works perfectly on a single server, but what happens when you have 10 servers, each with its own bucket? If each allows 100 requests/sec, one determined user could send 100 requests to each server, achieving 1,000/sec total while each server thinks it is enforcing limits correctly. This is the fundamental distributed rate limiting challenge.
Option 1: Divide the Budget
Give each server a fraction of the global limit. With 10 servers and 1,000/sec global limit, each gets r = 100/sec. Simple, no coordination needed, but wastes capacity when traffic is uneven. If 80% of requests hit Server A while others sit idle, you reject at 100/sec while 900 tokens/sec go unused across other servers. Works well when load balancing is consistent; fails when it is not.
Option 2: Central Coordination
All servers share one bucket state in Redis. Every request checks globally: read current level, decide admit or reject, update level. You get precise enforcement but add 0.5 to 2 ms network latency per request. At 10,000 requests/sec, that is 10,000 Redis operations per second. Single Redis handles 100K to 200K ops/sec, so high volume systems risk making the rate limiter itself the bottleneck.
Option 3: Token Leasing
Servers grab leases of capacity from the global bucket. Server A requests 200 tokens from Redis, enforces locally until depleted, then requests another batch. Reduces Redis operations from 10,000/sec to 50/sec (10,000 / 200). Trade-off: during sudden traffic shifts, one server might hoard tokens while others starve. Lease expiration (unused tokens return to pool after 1 to 5 seconds) mitigates this.
Consistency vs Availability
When Redis is unavailable, what happens? Strict enforcement means rejecting all requests (dangerous for availability). Lenient fallback means allowing requests without limits (dangerous for downstream). Common compromise: fall back to local per server limits during Redis outages. You lose global precision but maintain protection. Log incidents for investigation.

💡 Key Takeaways

✓Per node buckets allow N × r aggregate throughput across N nodes, violating global limits; 10 gateways each at 100 rps per user can collectively allow 1,000 rps from one user

✓Consistent hashing maps each principal (user, API key) to exactly one limiter node for single writer semantics, maintaining local speed while enforcing global limits without cross node coordination

✓Shared Redis store provides perfect global consistency at the cost of network hops and storage contention; Stripe pattern handles 100k+ checks per second per shard with sub millisecond latency using atomic updates

✓Clock skew in distributed GCRA causes unfair rate decisions; use single authoritative time source at limiter tier and compare only time deltas within configured tolerance to handle jitter

✓Circuit breakers handle limiter backend failures: fail closed (protect downstream, reject all) for critical paths or fail open (protect availability, allow all) for less critical endpoints based on business requirements

📌 Interview Tips

15 node limiter cluster with consistent hashing: each node handles 200k checks per second; cluster sustains approximately 1M checks per second with replication and realistic key distribution including hot keys

2Redis GCRA stores TAT (Theoretical Arrival Time) with key TTLs; idle principals naturally decay and reclaim memory while active keys get atomic read modify write with interval 1/r advancement per allowed request

← Back to Leaky Bucket Algorithm Overview