Resilience & Service PatternsCircuit Breaker PatternMedium⏱️ ~3 min

Circuit Breaker Trade Offs: When Not to Use Them

Circuit breakers are powerful but not universally applicable. Understanding when they help versus when they harm is crucial for system design decisions. Circuit breakers excel for synchronous, latency critical dependencies where failing fast with a fallback beats waiting. A web API calling a recommendation service benefits immensely: when recommendations are slow, open the breaker and show products without personalization in 50ms instead of blocking for 5 seconds. However, circuit breakers can reduce availability by intentionally denying service to dependencies that might have succeeded. If your threshold is 50% errors and you're at 51%, you're now rejecting 100% of calls even though the dependency is partially healthy. This is the core tradeoff: you're choosing predictable latency and resource preservation over maximum availability. For asynchronous, eventual operations like background jobs or event processing, circuit breakers often cause more problems than they solve. If you're enqueuing events to Kafka or writing to a message queue, the operation is naturally non blocking and backpressure can be absorbed through queue depth. Opening a breaker here just drops messages or fails jobs that could have succeeded with a simple retry minutes later. Better approach: use bounded queues with overflow policies and async retry with exponential backoff. When failures are rare and transient (less than 1% error rate, resolved in seconds), simple timeouts plus a retry or two provide sufficient protection without the state machine complexity. Circuit breakers shine when failure rates are high (10% to 50%) or sustained (minutes to hours), justifying the tuning and operational overhead. Similarly, if your traffic is too low (under 10 Requests Per Second), breaker statistics are too noisy to be reliable; health checks or simple retry logic works better. Finally, circuit breakers don't protect against overload on healthy systems. If your database is running at 100% CPU because you're simply sending too much legitimate traffic, opening a breaker just fails requests without reducing load (if retries continue) or shifts load to fallbacks that may also be overloaded. For this scenario, you need rate limiting, load shedding, and backpressure mechanisms instead. Use circuit breakers for fault isolation, not capacity management.
💡 Key Takeaways
Breakers reduce availability for predictability: at 51% error rate, you reject 100% of calls even though 49% would have succeeded, trading maximum availability for consistent latency
Async operations like message queues and background jobs don't benefit: better to use bounded queues with overflow policies and async retry with exponential backoff minutes later
Low error rates under 1% don't justify complexity: simple timeout (200ms) plus one retry with jitter provides sufficient protection without state machine overhead
Low traffic under 10 QPS has noisy statistics: breakers either never trip or flap constantly, use health checks or longer windows (60+ seconds) with lower minimum calls instead
Breakers don't solve overload on healthy systems: 100% CPU database won't recover if breaker opens but retries continue, need rate limiting and backpressure not fault isolation
Better alternatives exist: for transient failures use retries, for overload use rate limiting, for critical writes use queues, for tail latency use request hedging to backup replicas
📌 Examples
Anti pattern: Circuit breaker on Kafka producer fails fast during broker restart, drops events. Better: Bounded queue with 10 second timeout and async retry recovers automatically when broker returns
Good fit: User facing product API calls recommendation service with 200ms SLO. Breaker opens at 50% errors, serves cached recommendations in 50ms versus 5 second timeouts, improves user experience
Wrong tool: Database at 100% CPU due to traffic spike. Breaker opens but retries continue, load doesn't drop, need request level rate limiting at 1000 RPS with queue shedding instead
← Back to Circuit Breaker Pattern Overview