Circuit Breaker Pattern: Fail Fast to Preserve System Health
The Cascade Failure Problem
When Service A calls Service B, and B becomes slow or unresponsive, A accumulates waiting threads. Each thread holds memory, connection pool slots, and database connections. With 1,000 requests/second and 30 second timeouts, A quickly exhausts its thread pool. Now services calling A also start timing out. Within minutes, the entire system becomes unresponsive because of one failing service.
Why Retries Make It Worse
The instinct during failures is to retry. If 10% of requests fail and each retry adds 3 additional attempts, traffic to the failing service increases by 30%. When the service is already struggling under load, this additional traffic pushes it further into failure. The service cannot recover because it never gets breathing room.
The Circuit Breaker Solution
A circuit breaker sits between caller and callee, tracking success and failure rates. When failures cross a threshold like 50% over 10 seconds, it opens and immediately rejects all requests without calling the downstream service. This fail fast behavior returns errors in 1ms instead of waiting for timeouts, freeing resources instantly.
Three States
Closed: Normal operation. All requests pass through. The breaker counts failures. Open: Failure threshold exceeded. All requests fail immediately without attempting downstream call. Half Open: After a cooldown period, a limited number of test requests pass through. If they succeed, the breaker closes. If they fail, it reopens.