Resilience & Service PatternsBulkhead PatternEasy⏱️ ~1 min

Bulkhead Pattern: Isolating Failures Through Resource Partitioning

Definition
Bulkhead Pattern isolates components into separate resource pools so that failure in one component cannot exhaust resources needed by others, preventing cascade failures across the system.

The Ship Analogy

Ships have watertight compartments (bulkheads) so that a hull breach floods only one section, not the entire vessel. Software bulkheads work the same way: isolate resources so one failing dependency cannot sink the whole application. Without bulkheads, a slow database query can exhaust all threads, blocking unrelated requests that do not even use that database.

Why Shared Resources Are Dangerous

A service with a single thread pool of 200 threads handles requests to multiple downstream services. If Service A becomes slow, requests to A accumulate, consuming threads. When all 200 threads wait on A, requests to healthy Services B and C cannot be processed. One slow dependency has effectively taken down the entire application.

How Bulkheads Help

Allocate separate resource pools per dependency: 50 threads for Service A, 50 for B, 50 for C, 50 reserved. When A becomes slow and exhausts its 50 threads, B and C continue operating normally with their dedicated pools. The blast radius of A failure is contained to A related requests only.

💡 Key Insight: Circuit breakers stop calling failing services. Bulkheads ensure that even while waiting for circuit breakers to trip, the failure cannot consume resources needed by healthy paths.

Bulkheads vs Circuit Breakers

These patterns complement each other. Circuit breakers detect failures and stop traffic. Bulkheads contain damage while detection happens. A circuit breaker might take 10-30 seconds to trip. Without bulkheads, those seconds can exhaust all resources. With bulkheads, only the isolated pool is affected during that window.

💡 Key Takeaways
Bulkheads isolate resources per dependency so one slow service cannot exhaust threads needed by healthy services
Without bulkheads, 200 shared threads can all block on one slow dependency, stopping all other requests
Bulkheads contain damage while circuit breakers detect failures; they complement each other
📌 Interview Tips
1Use the ship analogy: watertight compartments prevent one breach from sinking the entire vessel
2Calculate the blast radius: 50 dedicated threads means only 25% of capacity affected vs 100% with shared pool
3Explain the timing gap: circuit breakers take 10-30 seconds to trip; bulkheads protect during that window
← Back to Bulkhead Pattern Overview