Resilience & Service PatternsGraceful DegradationHard

Degradation Anti-Patterns and Common Mistakes

Anti-Pattern: Fallback Calls Same Dependency

Primary reads from replica A. Fallback reads from replica B. Both share the same primary; primary failure breaks both paths. Effective fallbacks must use completely independent paths: different data stores, different network routes, different auth systems.

Anti-Pattern: Untested Fallback Paths

Fallback code executes rarely, so bugs hide undetected. A fallback might have null pointer exception when cache is empty. Force fallback execution weekly: manually open circuit breakers, verify fallbacks work. Treat fallback paths as production code requiring full testing.

💡 Key Insight: The fallback you have never tested is the fallback that will fail during your next incident. Schedule automated fallback tests in CI.

Anti-Pattern: Cascading Degradation

Service A degrades (50ms to 2000ms). B calling A exhausts thread pool waiting. C calling B experiences same. Prevention: timeouts at every boundary. A calling B with 100ms timeout experiences clean failure, not resource exhaustion. Combine with circuit breakers and bulkheads (isolated resource pools that prevent one slow dependency from consuming all resources).

Anti-Pattern: All-Or-Nothing

Systems supporting only "working" or "disabled" miss degradation opportunities. Search might have levels: full with facets, text only, cached popular searches, disabled. Feature flags should support full, limited, minimal, disabled states.

Anti-Pattern: Ignoring Fallback Capacity

If primary handles 10,000 RPS and fails, traffic shifts to fallback. Is fallback sized for 10,000 RPS? Often not. Capacity plan fallback infrastructure for expected degradation traffic.

Anti-Pattern: Silent Degradation

Degradation without alerting means issues go unnoticed. Circuit opens? Alert. Fallback active more than 5 minutes? Page. Track primary success rate separately from overall success rate.

💡 Key Takeaways
Fallbacks must use completely independent paths - same dependency tree means simultaneous failure
Test fallbacks regularly - untested fallback code will fail during actual incidents
Capacity plan for fallback traffic - if primary handles 10K RPS, fallback needs 10K RPS capacity too
📌 Interview Tips
1Cascading degradation through timeout exhaustion shows deep understanding of distributed systems
2Multiple degradation levels (full, limited, minimal, disabled) vs binary states demonstrates sophistication
3Silent degradation anti-pattern shows operational awareness beyond pure architecture
← Back to Graceful Degradation Overview