Rate LimitingLeaky Bucket AlgorithmMedium⏱️ ~3 min

Shaper vs Policer: Two Modes of Leaky Bucket

Two Interpretations of the Same Algorithm

The Leaky Bucket algorithm has two distinct implementations that behave very differently. Understanding which you are using (or building) is critical. The names come from network traffic management but apply directly to API rate limiting. Getting this wrong means your rate limiter either blocks legitimate traffic or fails to protect downstream services.

Policer Mode: Drop on Overflow

Check if adding this request would exceed bucket capacity. If yes, reject immediately with 429. If no, add to bucket. The bucket drains continuously at rate r. This mode is for enforcement: stop bad traffic without delaying good traffic. Response time is constant (sub millisecond check), but excess requests are lost. Use when downstream cannot queue (stateless microservices, CDN origins) and clients can retry. Example: API gateway protecting a stateless service. Requests beyond 1,000/sec get 429, successful requests see 2ms latency regardless of traffic.

Shaper Mode: Queue and Delay

Accept all requests into a queue until it fills. Process queue at constant rate r. Only reject when queue is completely full. This mode is for smoothing: absorb bursts and release them steadily. Response time varies based on queue depth, but fewer requests are lost. Use when downstream benefits from steady flow (databases, ML inference) and clients can tolerate delay. Example: ML inference queue draining at 10 requests/sec. A burst of 50 requests queues up; each waits 0 to 5 seconds but all complete.

Choosing Between Modes

Ask: What happens to rejected requests? If clients retry immediately, policer mode with proper Retry-After headers works well. If clients cannot retry or retries are expensive, shaper mode preserves work. Ask: What latency is acceptable? If users expect sub 100ms responses, shaper mode with a deep queue is wrong. If background jobs can wait 10 seconds, shaper mode maximizes throughput.

Hybrid Approach

Many production systems combine both. Use a small queue (shaper behavior for minor bursts) with a strict maximum wait time (policer behavior for sustained overload). If queue wait would exceed 500ms, reject immediately with Retry-After rather than making users wait indefinitely. This captures benefits of both: smooth small bursts, fail fast on sustained overload.

💡 Key Takeaways
Shaper mode queues requests and drains at rate r, trading bounded latency (max B/r) for fewer drops; use when clients tolerate queueing but you want to minimize failed requests
Policer mode (GCRA) drops requests immediately without buffering, giving instant feedback with minimal latency but more rejections during bursts; ideal for fast failure and explicit backpressure
Critical constraint for shapers: B/r must be less than minimum client timeout or requests will timeout while queued; example: 3 second timeout with r=200 rps requires B ≤ 600
Stripe's Redis GCRA policer handles hundreds of thousands of checks per second per shard with sub millisecond latency, enforcing 10 to 1,000 rps limits per API key globally
Policers can trigger synchronized retry storms if clients immediately retry drops; mitigate with jittered exponential backoff and retry after headers communicating when to retry
📌 Interview Tips
1ATM networking standardized GCRA to enforce peak cell rates with tight jitter bounds at tens of thousands of cells per second, minimizing bufferbloat in switching fabrics using constant spacing properties
2Production shaper configuration: r=500 rps, B=2000 gives max 4 second queueing; if upstream timeout is 3 seconds, most queued requests fail, wasting capacity and amplifying retries. Correct sizing: B ≤ 1500
← Back to Leaky Bucket Algorithm Overview