Rate LimitingToken Bucket AlgorithmMedium⏱️ ~3 min

Hierarchical Token Buckets and Multi-Scope Fairness

The Noisy Neighbor Problem

A single bucket per API sounds simple, but what happens when one heavy user consumes all 10,000 tokens/sec of your global limit? Everyone else gets rejected with 429 errors. This is the "noisy neighbor" problem: one runaway script or abusive actor ruins experience for 10,000 legitimate users. The fundamental issue is that simple flat rate limiting has no fairness isolation between consumers.

Buckets Within Buckets

Hierarchical token buckets stack multiple limits at different scopes. Think of a building with rules at every level: building allows 10,000 visitors/hour total, each floor allows 2,000/hour, each office allows 100/hour. A visitor must have capacity at ALL levels to enter. One office throwing a huge party hits their 100/hour limit without affecting other offices. Each level provides isolation from the level below.

Three Tier Example

API with 1 million users serving 100,000 req/sec. Three bucket levels: Global bucket at 100,000 req/sec protects infrastructure from total overload. Per tenant bucket at 5,000 req/sec per customer ensures no single customer consumes more than 5 percent even with "unlimited" access. Per user bucket at 100 req/sec per individual ensures runaway script burns 100 tokens in 1 second and waits rather than starving colleagues.

Why Not Just Per User Limits

Per user limits alone have vulnerability: 1,000 bad actors each consuming 100 req/sec equals 100,000 req/sec, your entire capacity gone. The global limit caps total damage. Per tenant limits ensure one hacked enterprise account cannot exhaust global capacity either. This is defense in depth: each layer catches threats the layer below missed.

Implementation with Redis

Each request checks three keys: "global:{minute}", "tenant:{tenant_id}:{minute}", "user:{user_id}:{minute}". Use INCR with EXPIRE on each. If any bucket is exhausted, reject. With pipelining, all three checks complete in one Redis round trip (0.3 to 1 ms). Return the most restrictive bucket in 429 response headers so clients know which limit they hit. Track per tenant consumption for metering and auto upgrade suggestions when at 80 percent consistently.

💡 Key Takeaways
Noisy neighbor: one user consuming 10K req/sec global limit blocks everyone else with 429 errors
Three tier hierarchy: global (100K/sec) protects infrastructure, per tenant (5K/sec) provides fairness, per user (100/sec) isolates individuals
Defense in depth: 1000 bad actors at 100 req/sec each equals 100K, but global limit catches this; per tenant catches single compromised account
Redis implementation: check 3 keys with pipelining in one round trip (0.3 to 1 ms); reject if any bucket exhausted
Return most restrictive bucket in 429 headers; track per tenant consumption for metering and upgrade suggestions
📌 Interview Tips
1Walk through the three tiers: request checks global (100K), tenant (5K), user (100). All must have capacity. One exhausted tier rejects the request.
2Explain defense in depth: per user alone fails against coordinated attack (1000 times 100 equals 100K). Global catches this. Per tenant catches single compromised enterprise account.
← Back to Token Bucket Algorithm Overview
Hierarchical Token Buckets and Multi-Scope Fairness | Token Bucket Algorithm - System Overflow