Token Bucket Algorithm: Core Mechanics and Burst Control

Definition
The Token Bucket is a rate limiting algorithm that controls request flow using a simple metaphor: a bucket fills with tokens at a constant rate, and each request consumes one token. When the bucket is empty, requests are rejected until more tokens accumulate.
The Arcade Analogy
Imagine an arcade where game tokens drip into your personal cup at a fixed rate, say one token per second. You can hold up to 20 tokens max (the bucket capacity). If you have saved up 15 tokens and want to play 10 games quickly, you can. But if you have zero tokens and a game costs one, you wait until the next token drips in. This simple rule balances immediate responsiveness with sustained limits.
Why This Matters for APIs
Token bucket solves a critical problem: how do you let users send bursts of requests when needed (loading a dashboard with 50 API calls) while protecting servers from sustained overload? Token bucket says: "burst up to your saved tokens, then slow to refill rate." A user idle for 30 seconds with rate equals 100/sec and capacity equals 200 can immediately send 200 requests, then sustain 100/sec afterward. Responsive yet safe.
The Two Parameters
Every token bucket has exactly two settings. First, refill rate (r): tokens per second. Second, bucket capacity (b): maximum saved tokens. With r=100, b=500, an idle user accumulates up to 500 tokens. They can immediately burst 500 requests, then sustain 100/sec as new tokens arrive. These two numbers completely define traffic shaping behavior.
Why Token Bucket Beats Simple Counting
Naive approach: max 100 requests per second, reset every second. Problem: user sends 100 at second 0.9 and 100 at second 1.1, that is 200 requests in 0.2 seconds while staying "within limits" on each boundary. Token bucket avoids this: consuming 100 tokens at 0.9s means only 20 tokens available at 1.1s (200ms times 100 tokens/sec refill). Burst is limited by actual token availability, not arbitrary boundaries.
Mathematical Guarantee
Over any time interval T, token bucket allows at most (r times T + b) requests. Rate 100/sec with capacity 500 allows at most 1,100 requests in any 10 second window: 100 times 10 plus 500 equals 1,100 maximum, 1,000 sustained. This predictable worst case makes capacity planning straightforward.

💡 Key Takeaways

✓Token bucket has two parameters: refill rate r (tokens per second) and bucket capacity b (max saved tokens)

✓Burst allowed up to saved tokens, then sustained at refill rate; idle user with r=100, b=500 can burst 500 then sustain 100/sec

✓Avoids boundary problems: consuming 100 tokens at 0.9s leaves only 20 at 1.1s, not 100 like naive per second counting

✓Mathematical guarantee: max (r times T + b) requests over any interval T; predictable worst case for capacity planning

✓Token bucket is responsive (allows bursts) yet safe (limits sustained rate); ideal for APIs with bursty legitimate traffic

📌 Interview Tips

1Walk through the two parameters: r=100 tokens/sec, b=500 capacity. Idle user has 500, can burst 500, then 100/sec sustained. Know these numbers.

2Explain why naive counting fails: 100 at 0.9s plus 100 at 1.1s is 200 in 0.2s. Token bucket prevents this by tracking actual token availability.

← Back to Token Bucket Algorithm Overview