Rate LimitingLeaky Bucket AlgorithmHard⏱️ ~3 min

Distributed Leaky Bucket: Coordination and Consistency

Implementing Leaky Bucket in a distributed system introduces coordination challenges. Per process or per node buckets are cheap and provide sub millisecond decision latency but do not enforce global fairness. If you run N independent rate limiter instances each allowing rate r, the aggregate system allows up to N × r, violating your global limit. For example, 10 gateway instances each enforcing 100 requests per second per user can collectively allow 1,000 rps from a single user, overloading downstream services. Two patterns solve this. First, use consistent hashing or deterministic principal sharding so a given identity (user ID, API key, IP address) maps to exactly one rate limiting node, creating a single writer for that principal's bucket. This maintains local speed (no cross node coordination per request) while enforcing global limits. If a limiter node handles 200k checks per second and you have 5 nodes with replication, the cluster sustains roughly 1M checks per second under realistic key distributions. Handle node failures with a circuit breaker that can fail closed (protect downstream, reject all) or fail open (protect availability, allow all) based on endpoint criticality. Second, use a strongly consistent shared store like Redis for bucket state. Stripe's GCRA implementation stores Theoretical Arrival Time (TAT) per API key in Redis with atomic read modify write operations. A single Redis shard handles hundreds of thousands of rate checks per second with sub millisecond latency per check when co located with decision logic. Use key Time To Live (TTLs) on idle principals to reclaim memory; TAT naturally decays via expiry. The trade off is network hops and storage contention versus perfect global consistency. Measure carefully: clock skew or inconsistent time sources across nodes in distributed GCRA can produce unfair decisions unless you use a single authoritative time source and only compare time deltas within configured tolerance.
💡 Key Takeaways
Per node buckets allow N × r aggregate throughput across N nodes, violating global limits; 10 gateways each at 100 rps per user can collectively allow 1,000 rps from one user
Consistent hashing maps each principal (user, API key) to exactly one limiter node for single writer semantics, maintaining local speed while enforcing global limits without cross node coordination
Shared Redis store provides perfect global consistency at the cost of network hops and storage contention; Stripe pattern handles 100k+ checks per second per shard with sub millisecond latency using atomic updates
Clock skew in distributed GCRA causes unfair rate decisions; use single authoritative time source at limiter tier and compare only time deltas within configured tolerance to handle jitter
Circuit breakers handle limiter backend failures: fail closed (protect downstream, reject all) for critical paths or fail open (protect availability, allow all) for less critical endpoints based on business requirements
📌 Examples
5 node limiter cluster with consistent hashing: each node handles 200k checks per second; cluster sustains approximately 1M checks per second with replication and realistic key distribution including hot keys
Redis GCRA stores TAT (Theoretical Arrival Time) with key TTLs; idle principals naturally decay and reclaim memory while active keys get atomic read modify write with interval 1/r advancement per allowed request
← Back to Leaky Bucket Algorithm Overview
Distributed Leaky Bucket: Coordination and Consistency | Leaky Bucket Algorithm - System Overflow