Rate LimitingLeaky Bucket AlgorithmMedium⏱️ ~2 min

When to Use Leaky Bucket: Trade-offs and Alternatives

Leaky Bucket shines when protecting stateful downstream systems from bursty traffic where even short bursts cause queuing collapse. Databases, caches, search indices, and Machine Learning (ML) inference services often exhibit sharp latency degradation under instantaneous load spikes because they maintain complex internal state (locks, buffer pools, query plans). A 2x momentary traffic spike to a database can push p99 latency from 20 milliseconds to 500 milliseconds and take minutes to recover. Leaky Bucket absorbs these spikes in controlled queues upstream, keeping downstream load steady and latencies predictable. The algorithm is also ideal for smoothing egress from fan in aggregators and background batchers. When hundreds of upstream services converge on a bottleneck tier, even independently random traffic creates bursty arrival patterns. Leaky Bucket placed at aggregation points transforms this into predictable load that downstream tiers can provision against. Use it for enforcing strict pacing with external partners who have contractual Transactions Per Second (TPS) caps where exceeding limits triggers penalties or blocks. Prefer Token Bucket when short bursts are acceptable and throughput utilization is the priority. Token Bucket allows you to spend accumulated tokens instantly for a burst up to bucket size, making better use of momentary spare capacity. For example, a content delivery service might allow 1,000 request per second average but burst to 5,000 rps for 1 second to serve a popular asset efficiently. Choose concurrency limits (maximum in flight requests) when the bottleneck is service capacity per concurrent request rather than arrival rate, such as connection pool exhaustion or worker thread saturation. Use sliding window counters for coarse grained quotas over minutes or hours where exact smoothness is unnecessary and you want simpler stateless enforcement.
💡 Key Takeaways
Use Leaky Bucket to protect stateful systems (databases, caches, search, ML inference) where bursts cause queuing collapse; a 2x spike can push database p99 from 20ms to 500ms with minutes of recovery time
Ideal for smoothing egress from fan in aggregators where hundreds of independent upstreams create bursty arrival patterns that need transformation into predictable downstream load
Prefer Token Bucket when short bursts improve utilization; Content Delivery Network (CDN) allowing 1,000 rps average but 5,000 rps burst for 1 second serves popular assets efficiently without wasting spare capacity
Choose concurrency limits over rate limits when bottleneck is per request capacity (connection pools, worker threads) rather than arrival rate; maximum in flight requests prevents resource exhaustion
Sliding window counters for coarse quotas over minutes or hours provide simpler stateless enforcement when exact request spacing is unnecessary for downstream protection
📌 Examples
External partner with contractual 500 TPS cap: Leaky Bucket enforces strict pacing to avoid penalties or service blocks from exceeding agreed limits, with small tolerance for clock jitter
Background batch job uploading metrics at 1,000 rps: Token Bucket allows bursting when network is available, Leaky Bucket wastes capacity by enforcing constant spacing when bursts do not harm downstream aggregators
← Back to Leaky Bucket Algorithm Overview
When to Use Leaky Bucket: Trade-offs and Alternatives | Leaky Bucket Algorithm - System Overflow