Bulkhead Pattern: Isolating Failures Through Resource Partitioning
The Ship Analogy
Ships have watertight compartments (bulkheads) so that a hull breach floods only one section, not the entire vessel. Software bulkheads work the same way: isolate resources so one failing dependency cannot sink the whole application. Without bulkheads, a slow database query can exhaust all threads, blocking unrelated requests that do not even use that database.
Why Shared Resources Are Dangerous
A service with a single thread pool of 200 threads handles requests to multiple downstream services. If Service A becomes slow, requests to A accumulate, consuming threads. When all 200 threads wait on A, requests to healthy Services B and C cannot be processed. One slow dependency has effectively taken down the entire application.
How Bulkheads Help
Allocate separate resource pools per dependency: 50 threads for Service A, 50 for B, 50 for C, 50 reserved. When A becomes slow and exhausts its 50 threads, B and C continue operating normally with their dedicated pools. The blast radius of A failure is contained to A related requests only.
Bulkheads vs Circuit Breakers
These patterns complement each other. Circuit breakers detect failures and stop traffic. Bulkheads contain damage while detection happens. A circuit breaker might take 10-30 seconds to trip. Without bulkheads, those seconds can exhaust all resources. With bulkheads, only the isolated pool is affected during that window.