Rate Limiting • Leaky Bucket AlgorithmMedium⏱️ ~3 min
Shaper vs Policer: Two Modes of Leaky Bucket
Leaky Bucket operates in two distinct modes with fundamentally different trade-offs. Shaper mode buffers incoming requests in a queue (the bucket) and drains them at a constant rate r. If upstream load is spiky but the long run average stays at or below r, the shaper guarantees smooth output. Any burst beyond capacity B experiences delay bounded by B/r, after which overflow is dropped. This trades bounded latency for fewer rejections, making it ideal when clients can tolerate some queueing but you want to minimize failed requests.
Policer mode enforces rate limits by dropping requests that arrive too early without buffering them. The canonical policer implementation is the Generic Cell Rate Algorithm (GCRA), which tracks a Theoretical Arrival Time (TAT) that advances at fixed intervals of 1/r. Each arrival is checked: allow if current time t ≥ TAT minus tolerance τ, then update TAT = max(TAT, t) + 1/r. Otherwise reject immediately. This provides instant feedback with minimal added latency but results in more drops during bursts.
Stripe popularized GCRA implemented in Redis for per API key enforcement. A single Redis shard handles hundreds of thousands of rate checks per second with sub millisecond latency using atomic updates and key Time To Live (TTLs). Typical limits range from 10 to 1,000 requests per second per principal with small early arrival tolerance to handle clock jitter. This policer approach is widely adopted in gateways and Content Delivery Networks (CDNs) for globally consistent identity based rate control because it keeps decision latency predictable and low.
💡 Key Takeaways
•Shaper mode queues requests and drains at rate r, trading bounded latency (max B/r) for fewer drops; use when clients tolerate queueing but you want to minimize failed requests
•Policer mode (GCRA) drops requests immediately without buffering, giving instant feedback with minimal latency but more rejections during bursts; ideal for fast failure and explicit backpressure
•Critical constraint for shapers: B/r must be less than minimum client timeout or requests will timeout while queued; example: 3 second timeout with r=200 rps requires B ≤ 600
•Stripe's Redis GCRA policer handles hundreds of thousands of checks per second per shard with sub millisecond latency, enforcing 10 to 1,000 rps limits per API key globally
•Policers can trigger synchronized retry storms if clients immediately retry drops; mitigate with jittered exponential backoff and retry after headers communicating when to retry
📌 Examples
ATM networking standardized GCRA to enforce peak cell rates with tight jitter bounds at tens of thousands of cells per second, minimizing bufferbloat in switching fabrics using constant spacing properties
Production shaper configuration: r=500 rps, B=2000 gives max 4 second queueing; if upstream timeout is 3 seconds, most queued requests fail, wasting capacity and amplifying retries. Correct sizing: B ≤ 1500