What is Distributed Rate Limiting and Why is it Needed?
The Concert Ticket Analogy
Imagine a concert with 1,000 tickets total. You set up 10 ticket booths to handle the crowd. If each booth independently tracks "up to 1,000 tickets sold," you massively oversell. The booths need to share state: "How many tickets have ALL booths sold combined?" That shared counter is what distributed rate limiting provides for API requests across your server fleet.
Why Local Rate Limiting Fails
With 10 servers and a 1,000 requests/sec global limit, local limiting gives each server 100/sec. If traffic is evenly distributed, this works. But if 80% of traffic hits one server (sticky sessions, hot user, geographic concentration), that server rejects at 100/sec while others have 900/sec of unused capacity. Local limits waste capacity and frustrate users unevenly.
The Core Challenge
Distributed systems face the CAP theorem: you cannot have consistency, availability, and partition tolerance simultaneously. For rate limiting, this means choosing between exact enforcement (strong consistency, requires coordination) and high availability (eventual consistency, may overshoot limits). Most systems choose "good enough" accuracy with high availability.
Central Store Pattern
The most common approach uses Redis as a central counter store. Every request checks and updates Redis atomically. With 10,000 requests/sec across your API, that is 10,000 Redis operations per second. Single Redis handles 100K to 200K ops/sec, so this scales reasonably. But each request now adds 0.5 to 2 ms network latency for the Redis round trip.