Circuit Breaker Tuning Knobs: Window Size, Thresholds, and Cooldowns

Failure Rate Window
The window defines how much history the breaker considers. A 10 second sliding window means only recent requests count. Short windows react quickly but trip on network hiccups. Longer windows are stable but slow to detect real outages. Count based windows (last 100 requests) suit variable traffic; time based windows suit steady traffic.
Failure Threshold
The percentage that triggers the open state. A 50% threshold opens when half of requests fail. Critical services use 25%; variable latency services need 70%. A minimum request volume of 20 requests prevents opening on statistical noise during low traffic.
Cooldown Period
How long the breaker stays open before testing recovery. A 30 second cooldown gives downstream services time to recover. Too short and you hammer recovering services. Exponential backoff starts at 30s, doubles each failure, caps at 5 minutes.
⚠️ Key Trade-off: Aggressive settings (25%, 5s window) protect quickly but cause false positives. Conservative settings (70%, 60s) avoid false alarms but detect outages slowly.
Half Open Behavior
When cooldown expires, the breaker allows 3 to 5 test requests through. All succeed: close and resume normal traffic. Any fail: reopen and restart cooldown. Advanced implementations gradually increase traffic: 10%, 25%, 50%, then full.
Tuning by Service Type
Payment: tight thresholds (25%/10s) because errors are costly. Search: higher tolerance (60%/30s) because degraded results are acceptable. Start with 50%/10s/30s defaults and tune from production data.

💡 Key Takeaways

✓Window size trades sensitivity for stability: short windows react fast but trip on hiccups, long windows are stable but slow

✓Minimum request volume (20+ requests) prevents statistical noise from opening breakers during low traffic

✓Start with 50% threshold, 10 second window, 30 second cooldown and tune based on observed false positive rates

📌 Interview Tips

1Explain tuning by service criticality: payment needs tight thresholds, search tolerates higher failure rates

2Mention exponential backoff cooldown: starts 30s, doubles each failure, caps at 5 minutes

3Describe half open gradual traffic increase: 10%, 25%, 50%, then full traffic

← Back to Circuit Breaker Pattern Overview