Layered Rate Limiting Strategy: Combining Multiple Scopes
Combining All Three Scopes
Production systems do not choose one strategy. They layer all three scopes, with each layer catching different threat profiles. Think of it as defense in depth: if one layer fails to catch a problem, the next layer provides backup.
Layer 1: Global (Outermost)
Check first at the edge (CDN, load balancer). Simple counter: "Has the API exceeded 100,000/sec?" If yes, return 503. No identity lookup, minimal computation. Catches: flash crowds, DDoS, cascading failures.
Layer 2: Per IP (Before Auth)
After global check passes, check IP limits. For unauthenticated endpoints (login, signup), this is primary protection. For authenticated, it is an additional abuse signal. Catches: credential stuffing, scraping, single origin attacks. Key: ratelimit:ip:{ip}:{endpoint}:{window}.
Layer 3: Per User (After Auth)
After authentication, check user limits. This is your primary quota enforcement. Different limits by plan tier. Catches: runaway scripts, exceeded quotas, compromised accounts. Key: ratelimit:user:{user_id}:{window}.
The Decision Flow
Request arrives. Check global: over? Reject 503. Check IP: suspicious? Reject 429. Authenticate. Check user: exceeded? Reject 429. All pass: process. Each rejection includes headers: Retry-After, X-RateLimit-Remaining.
Example Configuration
E-commerce API: Global 50K/sec. Per IP: 100/sec anonymous, 1K/sec authenticated. Per user: Free 100/hr, Pro 10K/hr. Each layer protects against different failure modes.