Resilience & Service PatternsLoad Shedding & BackpressureHard

Implementing Shedding in Practice

Where to Implement Shedding

Shed as early as possible in the request path. At the load balancer, shed before consuming app server resources. At the API gateway, shed before authentication overhead. The further upstream, the more resources saved. Load balancer rejection costs 0.1ms CPU; rejection after database queries costs 50ms plus database capacity.

Response Codes and Headers

Return 503 Service Unavailable with Retry-After: 5 header for temporary overload. Use 429 Too Many Requests for rate limiting. Include JSON body: {"error": "overloaded", "retry_after": 5}.

✅ Best Practice: Include jitter in Retry-After values. If all clients retry at second 5, you create a thundering herd. Return Retry-After: 3-7 randomly distributed.

Client Retry Behavior

Clients must implement exponential backoff with jitter: first retry 1s ± 500ms, second 2s ± 1s, third 4s ± 2s. Cap at 30-60 seconds max delay, 3-5 attempts max. Without proper backoff, shedding causes retry storms that amplify load.

Monitoring and Alerting

Track metrics: shed_requests_total, shed_rate, shed_reason. Alert at 1% shed rate (warn), 5% (page). Shedding during known peaks is healthy; during normal traffic indicates capacity problems.

Testing Load Shedding

Load test at 2-3x expected peak. Verify: shedding activates at thresholds, priority differentiation works, response codes are correct, system stabilizes after load decreases, no resource leaks during sustained shedding.

💡 Key Takeaways
Shed as early as possible - load balancer rejection costs 0.1ms vs 50ms at database layer
Use 503 with Retry-After header; include jitter to prevent thundering herd on retries
Monitor shed_rate separately from errors; warn at 1%, page at 5%
📌 Interview Tips
1Discuss where in your architecture you would implement shedding layers
2Always mention Retry-After with jitter to show you understand retry amplification
3Testing load shedding is often overlooked - mention verifying behavior under 2-3x peak load
← Back to Load Shedding & Backpressure Overview