Hierarchical Token Buckets and Multi-Scope Fairness
The Noisy Neighbor Problem
A single bucket per API sounds simple, but what happens when one heavy user consumes all 10,000 tokens/sec of your global limit? Everyone else gets rejected with 429 errors. This is the "noisy neighbor" problem: one runaway script or abusive actor ruins experience for 10,000 legitimate users. The fundamental issue is that simple flat rate limiting has no fairness isolation between consumers.
Buckets Within Buckets
Hierarchical token buckets stack multiple limits at different scopes. Think of a building with rules at every level: building allows 10,000 visitors/hour total, each floor allows 2,000/hour, each office allows 100/hour. A visitor must have capacity at ALL levels to enter. One office throwing a huge party hits their 100/hour limit without affecting other offices. Each level provides isolation from the level below.
Three Tier Example
API with 1 million users serving 100,000 req/sec. Three bucket levels: Global bucket at 100,000 req/sec protects infrastructure from total overload. Per tenant bucket at 5,000 req/sec per customer ensures no single customer consumes more than 5 percent even with "unlimited" access. Per user bucket at 100 req/sec per individual ensures runaway script burns 100 tokens in 1 second and waits rather than starving colleagues.
Why Not Just Per User Limits
Per user limits alone have vulnerability: 1,000 bad actors each consuming 100 req/sec equals 100,000 req/sec, your entire capacity gone. The global limit caps total damage. Per tenant limits ensure one hacked enterprise account cannot exhaust global capacity either. This is defense in depth: each layer catches threats the layer below missed.
Implementation with Redis
Each request checks three keys: "global:{minute}", "tenant:{tenant_id}:{minute}", "user:{user_id}:{minute}". Use INCR with EXPIRE on each. If any bucket is exhausted, reject. With pipelining, all three checks complete in one Redis round trip (0.3 to 1 ms). Return the most restrictive bucket in 429 response headers so clients know which limit they hit. Track per tenant consumption for metering and auto upgrade suggestions when at 80 percent consistently.