Global Rate Limiting: Service Wide Capacity Protection
The Last Line of Defense
Per user and per IP limits protect against individual bad actors. But what if EVERYONE is legitimate and they all show up at once? Flash sales, viral content, breaking news: suddenly your service sees 10x normal traffic from normal users. Per user limits do not help because each user is within their limit; there are just too many users. Global rate limiting protects your infrastructure from total overload.
Global Rate Limit Mechanics
Instead of tracking per identity, track total requests across the entire API or service. One counter: "total API requests this second." When it hits capacity (say 100,000/sec), reject additional requests regardless of who sends them. This is load shedding: deliberately dropping traffic to protect core functionality. Key format: ratelimit:global:{service}:{window}.
Setting the Global Limit
Base it on actual capacity, not wishful thinking. If your database handles 50,000 queries/sec before latency degrades, your global limit should be at or below that. Measure under load: run load tests to find the breaking point, then set limits at 80% of that. Account for downstream dependencies: if payment service handles 1,000/sec, checkout endpoint global limit cannot exceed that.
Fairness Concerns
Global limits are inherently unfair: first come first served means early arrivals succeed while later arrivals fail. At exactly 100,000/sec capacity with 110,000/sec demand, 10,000 random users get rejected. Mitigation: combine with per user limits so heavy users hit their individual ceiling before consuming global capacity. Premium users can have reservation: "reserve 10% of global capacity for enterprise tier."
Implementation Location
Check global limits as early as possible: at the edge (CDN, load balancer) before requests reach your application servers. This protects the entire stack. If using distributed check (Redis), make it a single atomic operation to avoid race conditions. Accept some imprecision: under extreme load, distributed counters might allow 105,000 instead of exactly 100,000. This is acceptable.