Degradation Anti-Patterns and Common Mistakes
Anti-Pattern: Fallback Calls Same Dependency
Primary reads from replica A. Fallback reads from replica B. Both share the same primary; primary failure breaks both paths. Effective fallbacks must use completely independent paths: different data stores, different network routes, different auth systems.
Anti-Pattern: Untested Fallback Paths
Fallback code executes rarely, so bugs hide undetected. A fallback might have null pointer exception when cache is empty. Force fallback execution weekly: manually open circuit breakers, verify fallbacks work. Treat fallback paths as production code requiring full testing.
Anti-Pattern: Cascading Degradation
Service A degrades (50ms to 2000ms). B calling A exhausts thread pool waiting. C calling B experiences same. Prevention: timeouts at every boundary. A calling B with 100ms timeout experiences clean failure, not resource exhaustion. Combine with circuit breakers and bulkheads (isolated resource pools that prevent one slow dependency from consuming all resources).
Anti-Pattern: All-Or-Nothing
Systems supporting only "working" or "disabled" miss degradation opportunities. Search might have levels: full with facets, text only, cached popular searches, disabled. Feature flags should support full, limited, minimal, disabled states.
Anti-Pattern: Ignoring Fallback Capacity
If primary handles 10,000 RPS and fails, traffic shifts to fallback. Is fallback sized for 10,000 RPS? Often not. Capacity plan fallback infrastructure for expected degradation traffic.
Anti-Pattern: Silent Degradation
Degradation without alerting means issues go unnoticed. Circuit opens? Alert. Fallback active more than 5 minutes? Page. Track primary success rate separately from overall success rate.