Resilience & Service PatternsLoad Shedding & BackpressureEasy

Surviving Overload: Load Shedding Fundamentals

Definition
Load Shedding is the deliberate rejection of incoming requests when a system approaches capacity, preventing total collapse by sacrificing some requests to serve others successfully.

Why Rate Limiting Is Not Enough

Rate limiting enforces quotas like 1000 requests/minute/user during normal operation. But it cannot handle scenarios where many users simultaneously make legitimate requests. A system rated for 10,000 RPS might face 50,000 RPS during a viral event. Rate limits pass all requests from each user, but your system collapses handling the aggregate load.

The Overload Death Spiral

Without load shedding, overload creates cascading failure. Response times increase from 50ms to 5000ms. Clients timeout and retry, doubling load. Connection pools exhaust, memory fills with queued requests. Eventually, zero requests succeed. A 2x overload without shedding can cause 100% failure; with shedding, you maintain 80-90% success rate.

⚠️ Key Trade-off: Load shedding means some users get errors. But the alternative is ALL users get errors. You trade individual failures for system availability.

Shedding Triggers

Trigger shedding based on measurable signals: CPU above 80%, queue depth exceeding 1000 requests, latency crossing p99 > 500ms (99th percentile, meaning 99% of requests are faster), or memory above 85%. Detect overload at 70-80% capacity, giving headroom to shed gracefully rather than crash.

Basic Shedding Decision

The simplest approach rejects requests when thresholds are exceeded. Return HTTP 503 Service Unavailable with a Retry-After header. More sophisticated approaches use priority, shedding low value requests first while protecting critical operations.

💡 Key Takeaways
Load shedding deliberately rejects requests to prevent total system collapse during overload
Without shedding, 2x overload can cause 100% failure; with shedding, maintain 80-90% success rate
Trigger shedding at 70-80% capacity thresholds (CPU, queue depth, latency, memory)
📌 Interview Tips
1When discussing system capacity, mention load shedding as your defense against traffic spikes beyond rate limits
2Explain the death spiral: slow responses cause retries cause more load cause slower responses
3Interviewers often ask about handling 10x traffic - load shedding is the answer when scaling cannot keep pace
← Back to Load Shedding & Backpressure Overview
Surviving Overload: Load Shedding Fundamentals | Load Shedding & Backpressure - System Overflow