Concurrency Control and Bulkhead Isolation

Bulkhead Pattern
A bulkhead isolates failures to prevent cascade. Ship hulls have bulkheads so one breach does not sink the entire ship. In systems, bulkheads are resource boundaries: separate thread pools, connection pools, or rate limits for different operations.
Without bulkheads, one slow dependency consumes all resources. A database going slow exhausts the thread pool. Now even requests that do not need the database cannot get a thread. The system appears completely down when only one component failed. Bulkheads contain the blast radius.
Implementing Bulkheads
Thread pool isolation: Dedicate separate thread pools to different operations. Critical operations get guaranteed capacity. If the recommendations thread pool exhausts, checkout still works. Size pools based on operation latency and throughput requirements.
Connection pool separation: Each downstream service gets its own connection pool. A slow service exhausts only its pool. Other services remain accessible. Monitor pool utilization to detect emerging problems before total exhaustion.
Semaphore limits: Cap concurrent operations with semaphores. Allow maximum 50 concurrent database queries regardless of available threads. Excess requests queue or reject immediately. Prevents overloading downstream systems.
Circuit Breaker Integration
Circuit breakers complement bulkheads. When a dependency fails repeatedly, the circuit opens and requests fail fast without consuming resources. After a timeout, the circuit allows trial requests. If they succeed, the circuit closes. If they fail, it reopens.
Combine with bulkheads: the bulkhead limits concurrent attempts, the circuit breaker stops attempts entirely when the dependency is known bad. Bulkheads protect your system from slow dependencies. Circuit breakers protect dependencies from being overwhelmed during recovery.
✅ Best Practice: Size bulkheads based on acceptable degradation. If checkout needs 100 threads and recommendations needs 50, allocate pools of 120 and 60. The 20% buffer handles bursts. Monitor pool queue depths as early warning of saturation.

💡 Key Takeaways

✓Bulkheads isolate failures: one slow dependency exhausts only its allocated resources

✓Without isolation, one component failure appears as total system failure

✓Thread pools, connection pools, and semaphores all implement bulkhead pattern

✓Circuit breakers complement bulkheads: stop trying when dependency is known bad

✓Size bulkheads with 20% buffer above steady state requirements

📌 Interview Tips

1When designing resilient systems, explain bulkhead sizing: critical operations get dedicated resources that cannot be consumed by non critical work

2Draw the circuit breaker state machine: closed (normal), open (failing fast), half open (testing recovery). Explain transition triggers

3If asked about failure isolation, describe how a slow recommendation service should not affect checkout because they use separate thread pools

← Back to Concurrency vs Parallelism Overview