Resilience & Service Patterns • Circuit Breaker PatternMedium⏱️ ~3 min
Circuit Breaker Placement: Client Side, Service Mesh, or Gateway
Where you place circuit breakers fundamentally changes their behavior, latency overhead, and effectiveness. The three main options each solve different problems and are often used together in layered defense.
Client side breakers live in application code or client libraries, making decisions per service instance. This gives you the lowest latency (no extra network hop) and finest granularity: you can customize thresholds per endpoint, tenant, or priority class, and fail fast before even serializing the request. Netflix Hystrix exemplified this approach with per dependency thread pools and breakers in each service instance. The downside is coordination: every instance makes independent decisions, so 10% of your fleet might still be calling a failing dependency while others are open. You also need to implement the logic in every language and keep it updated.
Service mesh or sidecar breakers (Envoy, Istio, Linkerd) enforce uniform policies at the proxy layer independent of application code. Used by Lyft, Shopify, and DoorDash at scale, this approach provides consistent protection even for non cooperative clients and centralizes configuration. Envoy's outlier detection ejects unhealthy instances from load balancing pools and enforces concurrency limits (max connections, pending requests, retries per host) that prevent overload even on healthy but saturated clusters serving millions of Requests Per Second (RPS). The tradeoff is slightly higher latency (microseconds for local proxy) and less context: the proxy doesn't know if a 500 error is retriable or not without extra metadata.
API gateway or edge breakers provide coarse grained, cross cutting protection for entire backend services, useful for protecting shared infrastructure and enforcing organizational policies. However, they're too far from the caller to provide fast local feedback and can create availability issues if the gateway itself becomes a bottleneck or single point of failure. Best practice: use mesh level breakers for infrastructure protection plus client side breakers for fine grained, context aware decisions.
💡 Key Takeaways
•Client side breakers provide sub millisecond fail fast decisions and fine grained control per endpoint/tenant but require coordination across independent service instances
•Service mesh breakers like Envoy enforce uniform concurrency limits (max connections, pending requests per host) that protect against overload at millions of RPS regardless of app behavior
•Mesh placement protects against non cooperative clients and centralizes config but adds microseconds of latency and lacks application context for retriability decisions
•Gateway breakers are too coarse for most use cases but useful for protecting shared infrastructure and enforcing organizational rate limits or quotas
•Best practice is layered defense: mesh for infrastructure protection and concurrency limits, client side for context aware decisions and fast failover with fallbacks
•Independent breaker state per instance is a feature not a bug: provides natural sampling and prevents coordination overhead, but means 10% to 20% of fleet may still call failing dependencies
📌 Examples
Netflix approach: Client library (Hystrix) with per dependency breakers and thread pools in every service instance, typical pool size 10 to 20 threads, fail fast before RPC serialization
Lyft Envoy mesh: Outlier detection at proxy ejects instances showing 5 consecutive errors over 5 to 10 seconds, enforces max 1000 connections per host, max 100 pending requests per host
Shopify service mesh: Uses Envoy circuit breaking to cap in flight requests and retries per upstream cluster, preventing retry storms during partial outages at millions of requests per day