Rate LimitingDistributed Rate LimitingMedium⏱️ ~3 min

Trade-offs: Accuracy vs Performance and Consistency vs Availability

The Central Trade-off

Distributed rate limiting forces a choice between accuracy (enforcing exact limits) and availability (system keeps working when coordination fails). You cannot have both perfectly. This mirrors the CAP theorem: in the presence of network partitions, you must choose between consistency and availability. Understanding this trade-off is essential for designing rate limiting that matches your actual requirements.

Strong Consistency: Accurate but Fragile

Every request checks the central store before proceeding. If Redis is down, all requests block or fail. You get exact enforcement: limit of 1,000/sec means exactly 1,000/sec, never 1,001. Cost: Redis becomes a single point of failure. A 1 second Redis outage means 1 second of complete API unavailability. Use when: billing accuracy is critical, compliance requires exact limits, overshoot has severe consequences (security, cost).

Eventual Consistency: Available but Approximate

Servers maintain local counters with periodic sync to central store. If Redis is down, local limits still apply. You get high availability but may overshoot: with 10 servers using local limits of 100/sec each, during a sync delay you might allow 1,100/sec instead of 1,000/sec. Use when: availability trumps accuracy, overshoot by 10 to 20% is acceptable, rate limits are for protection not billing.

Hybrid Approach: Token Leasing

Servers request batches of tokens from the central store. Server A gets a lease of 200 tokens, enforces locally until depleted, then requests more. Reduces Redis operations from 10,000/sec to 50/sec. Trade-off: maximum overshoot is num_servers × lease_size. With 10 servers and 200 token leases, you might exceed by 2,000 requests during a traffic spike before all servers exhaust their leases and check back.

Choosing Your Point on the Spectrum

For billing/compliance: strong consistency. For API protection: eventual consistency or leasing. For high traffic APIs: token leasing with small leases gives good accuracy with 100x reduction in coordination overhead. Monitor your actual overshoot percentage; if it is under 5%, you have made the right trade-off.

💡 Key Takeaways
Token bucket requires atomic read modify write to prevent double spend: without atomicity, two concurrent requests can both read 1 remaining token and both decrement it, allowing 2 requests through
Fail open strategies preserve availability during store outages but risk backend overload; fail closed protects backends but creates false denials. Most systems fail open with backend circuit breakers as secondary defense
Cross region enforcement centralizing in one region adds round trip time overhead (typically 50 to 200 milliseconds across continents), making per region budgets with periodic reconciliation more practical for global systems
Approximated sliding windows double storage operations compared to fixed windows (reading two counters instead of one) but reduce boundary burst error from 100% overshoot to under 1% when properly tuned
📌 Interview Tips
1Amazon API Gateway enforces rate limits at edge locations (fail open on store unavailability) with backend throttling as secondary protection. This tolerates brief overruns during outages while preventing cascading failures to origin servers.
2A global API with 100 requests per minute limit can allocate 40 to US region, 40 to Europe region, and 20 to Asia region as independent budgets. Worst case overrun is capped at 100 instead of 300 if regions were completely independent, while avoiding cross region synchronization latency on every request.
← Back to Distributed Rate Limiting Overview
Trade-offs: Accuracy vs Performance and Consistency vs Availability | Distributed Rate Limiting - System Overflow