OS & Systems FundamentalsProcesses vs ThreadsHard⏱️ ~2 min

Trade-offs: Isolation vs Performance at Scale

The process versus thread decision fundamentally trades isolation against performance, and the right choice depends on your latency budget and failure tolerance. Processes provide strong fault and security isolation. A crash or memory corruption is contained within one process. Chrome's architecture proves this: when a renderer crashes, only that tab dies. The tradeoff is higher IPC overhead (10 to 100 microseconds per round trip vs sub microsecond in process), larger context switch costs due to TLB and page table changes, and significantly higher memory footprint due to duplicated kernel structures and page tables. Threads enable low latency cooperation through shared memory and cheaper context switches. In process handoffs cost tens to hundreds of nanoseconds when uncontended. This matters enormously at scale: if your service demands sub 100 microsecond tail latencies, you cannot afford cross process IPC on the hot path. However, threads couple failures tightly. Any fatal bug, memory corruption, or runaway allocation can kill the entire process and all threads within it. One thread with a memory safety bug can corrupt another thread's data structures silently. Latency budgets drive the architecture. Amazon, Meta, and Google all use process boundaries for service isolation and blast radius control, while keeping hot paths multi threaded within each service. For example, a Meta cache service runs as one process with per core sharding across threads to achieve millions of operations per second. Cross service communication uses RPC (hundreds of microseconds), but hot path requests never leave the process. Microsoft SQL Server similarly uses cooperative schedulers and worker threads within a single server process for performance, while isolating different databases or tenants at the process boundary for safety and resource accounting.
💡 Key Takeaways
Latency budget determines architecture: Sub 100 microsecond tail latencies require in process threading on hot paths. Cross process IPC adds 10 to 100 microseconds per round trip, consuming most of your budget before doing actual work.
Failure coupling differs fundamentally: Process crash affects only that process (Chrome tab dies, browser survives). Thread crash or memory corruption kills the entire process and all threads within it.
Memory footprint scales differently: Each process adds 50 to 150 MB for kernel structures, page tables, and duplicated libraries. Each thread adds 0.5 to 8 MB reserved stack space but shares everything else.
Resource accounting favors processes: Operating systems provide per process memory limits, CPU quotas, and I/O throttling. Threads within a process share these limits, so one hot thread can starve others without explicit admission control.
NUMA effects hit multi threaded programs hard: Allocating memory on one NUMA node and accessing from another causes 1.3 to 2x latency penalties. Large multi threaded applications require careful thread affinity and memory locality tuning to avoid tail latency doubling.
📌 Examples
Meta cache service: One process per server, state sharded across threads by core. In-process: <1µs handoff, millions of ops/sec. Cross-service RPC: 200-500µs when needed.
Google Chrome: Process isolation trades 100+ MB memory overhead for crash containment. Security boundary enforced by OS, not application code.
Kafka broker: Multi-GB/s throughput in one process with internal thread coordination. Process boundary used between brokers for fault isolation and rolling deploys, not hot path performance.
← Back to Processes vs Threads Overview