Shaper vs Policer: Two Modes of Leaky Bucket
Two Interpretations of the Same Algorithm
The Leaky Bucket algorithm has two distinct implementations that behave very differently. Understanding which you are using (or building) is critical. The names come from network traffic management but apply directly to API rate limiting. Getting this wrong means your rate limiter either blocks legitimate traffic or fails to protect downstream services.
Policer Mode: Drop on Overflow
Check if adding this request would exceed bucket capacity. If yes, reject immediately with 429. If no, add to bucket. The bucket drains continuously at rate r. This mode is for enforcement: stop bad traffic without delaying good traffic. Response time is constant (sub millisecond check), but excess requests are lost. Use when downstream cannot queue (stateless microservices, CDN origins) and clients can retry. Example: API gateway protecting a stateless service. Requests beyond 1,000/sec get 429, successful requests see 2ms latency regardless of traffic.
Shaper Mode: Queue and Delay
Accept all requests into a queue until it fills. Process queue at constant rate r. Only reject when queue is completely full. This mode is for smoothing: absorb bursts and release them steadily. Response time varies based on queue depth, but fewer requests are lost. Use when downstream benefits from steady flow (databases, ML inference) and clients can tolerate delay. Example: ML inference queue draining at 10 requests/sec. A burst of 50 requests queues up; each waits 0 to 5 seconds but all complete.
Choosing Between Modes
Ask: What happens to rejected requests? If clients retry immediately, policer mode with proper Retry-After headers works well. If clients cannot retry or retries are expensive, shaper mode preserves work. Ask: What latency is acceptable? If users expect sub 100ms responses, shaper mode with a deep queue is wrong. If background jobs can wait 10 seconds, shaper mode maximizes throughput.
Hybrid Approach
Many production systems combine both. Use a small queue (shaper behavior for minor bursts) with a strict maximum wait time (policer behavior for sustained overload). If queue wait would exceed 500ms, reject immediately with Retry-After rather than making users wait indefinitely. This captures benefits of both: smooth small bursts, fail fast on sustained overload.