OS & Systems Fundamentals • Processes vs ThreadsHard⏱️ ~2 min
Implementation Patterns: Per Core Sharding and Thread Pool Sizing
Per core sharding eliminates lock contention by partitioning state so each thread or core owns its shard exclusively. Instead of a global hash table protected by locks, you create N hash tables where N equals your core count. Requests are routed to shards using consistent hashing on request keys. Threads communicate via single producer single consumer queues when work must move between shards. This pattern transforms a lock heavy workload with high contention into a lock free architecture where each core operates independently. Memcached applies this: it shards its hash table across threads, and each thread handles requests for its shard without cross thread locking on the hot path. The result is scaling from 1 million to 10+ million operations per second.
Thread pool sizing requires understanding your workload. For CPU bound tasks, size pools to the number of physical cores (not hyperthreads). More threads just increase context switching overhead without adding throughput. For I/O bound tasks with async I/O, you can use 2 to 4x core count because threads spend time waiting. However, track queue depth and service time. If queue depth grows unbounded, you're overloaded and need backpressure. Set admission control thresholds: reject new requests when queue depth exceeds a limit or when queuing delay would violate your Service Level Objective (SLO). This fails fast rather than letting latency spiral.
Process patterns focus on isolation boundaries. Use worker process pools sized to core count for CPU bound services, and apply graceful reload by spawning new workers, waiting for old ones to drain, then killing them. For untrusted code or crash prone components like third party plugins or machine learning model sandboxes, isolate them in separate processes with resource limits. Apply per process CPU quotas, memory limits, and I/O throttling to prevent noisy neighbor effects. For hot data paths, keep computation within one process and push cold or background tasks to separate worker processes via queues. This protects your latency critical path from interference.
💡 Key Takeaways
•Per core sharding eliminates lock contention: Partition state into N shards (N equals cores), route requests by key hash to shard owner. Each core operates on its shard independently, avoiding cache line bouncing and lock waits.
•Thread pool sizing for CPU bound work: Set pool size to physical core count. More threads add context switch overhead without throughput gains. For I/O bound async work: 2 to 4x cores while monitoring queue depth.
•Admission control prevents latency spirals: Reject new requests when queue depth exceeds threshold or queuing delay would violate SLO. Failing fast with 503 errors is better than adding requests to an unbounded queue that will timeout anyway.
•Worker process pools enable zero downtime deploys: Spawn new worker processes, wait for old ones to finish in flight requests (drain period), then kill old workers. Supervisor process manages lifecycle and restarts crashed workers automatically.
•Isolate untrusted or crash prone code in separate processes with resource limits: Apply per process memory caps (e.g., 2 GB max), CPU quotas (50% of one core), and I/O throttling. One plugin crash or memory leak doesn't take down the main service.
📌 Examples
Memcached sharding: Hash table split into 16 shards on 16 core machine. Each thread owns one shard. Result: 10M+ ops/sec with minimal lock contention vs 2M ops/sec with global lock.
NGINX worker pool: 8 worker processes (one per core) with master process supervising. Graceful reload: start 8 new workers, wait 30 sec drain period, kill old workers. Zero downtime deploys.
Amazon service pattern: Main service process sized to cores for low latency hot path. Background jobs (log aggregation, metrics) run in separate worker processes via SQS queue. Hot path protected from background work interference.