Production Circuit Breaker Integration: Timeouts, Fallbacks, and Observability
Timeout Coordination
Circuit breaker timeouts must be shorter than client timeouts. If your HTTP client timeout is 30 seconds but circuit breaker timeout is 60 seconds, the client gives up before the breaker can help. Set breaker timeouts at 50-80% of client timeouts. For a 10s client timeout, use 5-8s breaker timeout so the breaker fails fast while the client still has time to handle the fallback.
Fallback Strategies
When the breaker is open, what does the caller do? Options: Cached data returns the last known good response, works for read heavy services. Default values return safe defaults like empty lists or zero counts. Degraded mode disables non essential features while keeping core functionality. Queue for later stores requests to retry when service recovers. Each fallback should be tested: the fallback path often has bugs because it rarely executes in normal operation.
Observability Requirements
You must know when breakers trip. Essential metrics: breaker state changes (with timestamps), failure rates per downstream, request latency histograms, fallback invocation counts. Alert on: breaker open for longer than 5 minutes, repeated open/close cycles (flapping), fallback error rates increasing. Dashboard should show all breakers with current state, time in state, and recent history.
Testing Circuit Breakers
Unit tests verify state transitions work correctly. Integration tests inject failures to verify breakers trip. Chaos testing randomly opens breakers in production to verify fallbacks work. Load testing verifies breakers do not cause performance regressions under normal conditions. The most common bug: fallback code that was never actually executed in production breaks when finally needed.