Degradation Anti-Patterns and Common Mistakes

Anti-Pattern: Fallback Calls Same Dependency
Primary reads from replica A. Fallback reads from replica B. Both share the same primary; primary failure breaks both paths. Effective fallbacks must use completely independent paths: different data stores, different network routes, different auth systems.
Anti-Pattern: Untested Fallback Paths
Fallback code executes rarely, so bugs hide undetected. A fallback might have null pointer exception when cache is empty. Force fallback execution weekly: manually open circuit breakers, verify fallbacks work. Treat fallback paths as production code requiring full testing.
💡 Key Insight: The fallback you have never tested is the fallback that will fail during your next incident. Schedule automated fallback tests in CI.
Anti-Pattern: Cascading Degradation
Service A degrades (50ms to 2000ms). B calling A exhausts thread pool waiting. C calling B experiences same. Prevention: timeouts at every boundary. A calling B with 100ms timeout experiences clean failure, not resource exhaustion. Combine with circuit breakers and bulkheads (isolated resource pools that prevent one slow dependency from consuming all resources).
Anti-Pattern: All-Or-Nothing
Systems supporting only "working" or "disabled" miss degradation opportunities. Search might have levels: full with facets, text only, cached popular searches, disabled. Feature flags should support full, limited, minimal, disabled states.
Anti-Pattern: Ignoring Fallback Capacity
If primary handles 10,000 RPS and fails, traffic shifts to fallback. Is fallback sized for 10,000 RPS? Often not. Capacity plan fallback infrastructure for expected degradation traffic.
Anti-Pattern: Silent Degradation
Degradation without alerting means issues go unnoticed. Circuit opens? Alert. Fallback active more than 5 minutes? Page. Track primary success rate separately from overall success rate.

💡 Key Takeaways

✓Fallbacks must use completely independent paths - same dependency tree means simultaneous failure

✓Test fallbacks regularly - untested fallback code will fail during actual incidents

✓Capacity plan for fallback traffic - if primary handles 10K RPS, fallback needs 10K RPS capacity too

📌 Interview Tips

1Cascading degradation through timeout exhaustion shows deep understanding of distributed systems

2Multiple degradation levels (full, limited, minimal, disabled) vs binary states demonstrates sophistication

3Silent degradation anti-pattern shows operational awareness beyond pure architecture

← Back to Graceful Degradation Overview