Surviving Overload: Load Shedding Fundamentals
Why Rate Limiting Is Not Enough
Rate limiting enforces quotas like 1000 requests/minute/user during normal operation. But it cannot handle scenarios where many users simultaneously make legitimate requests. A system rated for 10,000 RPS might face 50,000 RPS during a viral event. Rate limits pass all requests from each user, but your system collapses handling the aggregate load.
The Overload Death Spiral
Without load shedding, overload creates cascading failure. Response times increase from 50ms to 5000ms. Clients timeout and retry, doubling load. Connection pools exhaust, memory fills with queued requests. Eventually, zero requests succeed. A 2x overload without shedding can cause 100% failure; with shedding, you maintain 80-90% success rate.
Shedding Triggers
Trigger shedding based on measurable signals: CPU above 80%, queue depth exceeding 1000 requests, latency crossing p99 > 500ms (99th percentile, meaning 99% of requests are faster), or memory above 85%. Detect overload at 70-80% capacity, giving headroom to shed gracefully rather than crash.
Basic Shedding Decision
The simplest approach rejects requests when thresholds are exceeded. Return HTTP 503 Service Unavailable with a Retry-After header. More sophisticated approaches use priority, shedding low value requests first while protecting critical operations.