When to Use Leaky Bucket: Trade-offs and Alternatives

When Leaky Bucket Wins
Choose leaky bucket when your downstream is fragile and cannot tolerate any burst at all. Databases, ML inference services, and search engines often have this property. They maintain complex internal state (locks, buffer pools, query plans) that gets corrupted or slow under sudden load. A 2x traffic spike lasting just 1 second sounds harmless, but for a database that spike can push p99 latency from 20ms to 500ms and take minutes to recover as lock contention cascades.
When Token Bucket is Better
User facing APIs often need burst responsiveness. Loading a dashboard might trigger 50 API calls in 100ms. Token bucket lets this through instantly (if tokens are available), while leaky bucket queues them for 500ms even if downstream could handle the burst. For services that scale horizontally and handle bursts gracefully, token bucket provides better user experience.
Hybrid Patterns
Real systems often combine algorithms. Token bucket at the edge (allow bursts to load balancers) with leaky bucket at the database tier (smooth writes to PostgreSQL). Or use leaky bucket for writes (steady flow matters) and token bucket for reads (burst tolerance matters). The key insight: different parts of your system have different characteristics, so use the algorithm that matches each component.
Decision Framework
Ask these questions: Can downstream handle bursts? If yes, token bucket. If no, leaky bucket. Is latency variance acceptable? If users tolerate 100ms to 2 second variance, leaky bucket works. If they expect consistent sub 100ms, token bucket is better. Is traffic naturally bursty? If yes and downstream can handle it, do not artificially smooth it. If downstream cannot handle it, smoothing is necessary.
Common Mistakes
Using leaky bucket for user facing APIs when token bucket would work: creates unnecessary latency. Using token bucket for fragile backends: allows bursts that trigger cascading failures. Setting bucket capacity B too high: creates unacceptable worst case latency. Not monitoring queue depth: missing early warning of capacity mismatch. Ignoring head of line blocking: one slow request delays many. Always match the algorithm to the workload characteristics.

💡 Key Takeaways

✓Use Leaky Bucket to protect stateful systems (databases, caches, search, ML inference) where bursts cause queuing collapse; a 2x spike can push database p99 from 20ms to 500ms with minutes of recovery time

✓Ideal for smoothing egress from fan in aggregators where hundreds of independent upstreams create bursty arrival patterns that need transformation into predictable downstream load

✓Prefer Token Bucket when short bursts improve utilization; Content Delivery Network (CDN) allowing 1,000 rps average but 5,000 rps burst for 1 second serves popular assets efficiently without wasting spare capacity

✓Choose concurrency limits over rate limits when bottleneck is per request capacity (connection pools, worker threads) rather than arrival rate; maximum in flight requests prevents resource exhaustion

✓Sliding window counters for coarse quotas over minutes or hours provide simpler stateless enforcement when exact request spacing is unnecessary for downstream protection

📌 Interview Tips

1External partner with contractual 500 TPS cap: Leaky Bucket enforces strict pacing to avoid penalties or service blocks from exceeding agreed limits, with small tolerance for clock jitter

2Background batch job uploading metrics at 1,000 rps: Token Bucket allows bursting when network is available, Leaky Bucket wastes capacity by enforcing constant spacing when bursts do not harm downstream aggregators

← Back to Leaky Bucket Algorithm Overview