Implementing Shedding in Practice
Where to Implement Shedding
Shed as early as possible in the request path. At the load balancer, shed before consuming app server resources. At the API gateway, shed before authentication overhead. The further upstream, the more resources saved. Load balancer rejection costs 0.1ms CPU; rejection after database queries costs 50ms plus database capacity.
Response Codes and Headers
Return 503 Service Unavailable with Retry-After: 5 header for temporary overload. Use 429 Too Many Requests for rate limiting. Include JSON body: {"error": "overloaded", "retry_after": 5}.
Retry-After: 3-7 randomly distributed.Client Retry Behavior
Clients must implement exponential backoff with jitter: first retry 1s ± 500ms, second 2s ± 1s, third 4s ± 2s. Cap at 30-60 seconds max delay, 3-5 attempts max. Without proper backoff, shedding causes retry storms that amplify load.
Monitoring and Alerting
Track metrics: shed_requests_total, shed_rate, shed_reason. Alert at 1% shed rate (warn), 5% (page). Shedding during known peaks is healthy; during normal traffic indicates capacity problems.
Testing Load Shedding
Load test at 2-3x expected peak. Verify: shedding activates at thresholds, priority differentiation works, response codes are correct, system stabilizes after load decreases, no resource leaks during sustained shedding.