OS & Systems FundamentalsConcurrency vs ParallelismEasy⏱️ ~2 min

Concurrency vs Parallelism: Core Distinction

Concurrency is about dealing with many things at once by overlapping waits and interleaving progress. On a single core, only one instruction executes at any moment, but the scheduler switches between tasks so multiple operations appear to make progress simultaneously. This is essential for Input/Output (I/O) bound workloads where tasks spend most time waiting on network, disk, or downstream services rather than consuming Central Processing Unit (CPU) cycles. Parallelism is about doing many things at the same time using multiple independent execution units like cores, sockets, or machines. Multiple instructions truly execute simultaneously. This matters for CPU bound workloads where computation dominates the critical path. The speedup is constrained by Amdahl's Law: with 5% serial work, even infinite processors cap speedup at 20×. The key difference surfaces in real metrics. WhatsApp handles 2 million concurrent TCP connections on a single server using lightweight Erlang processes, primarily leveraging concurrency with modest parallelism across available cores. Meanwhile, a 64 core media encoding server achieves 50 to 60× speedup through pure parallelism when the serial fraction stays below 2%. Production systems typically combine both: a web tier might handle 100,000 concurrent requests (concurrency) while each request fans out parallel Remote Procedure Calls (RPCs) to backend shards (parallelism).
💡 Key Takeaways
Concurrency increases work in progress (WIP) without requiring multiple cores. Little's Law applies: WIP equals Throughput times Latency. Hiding waits behind other work reduces observed latency for I/O bound systems.
Parallelism requires multiple execution units and targets CPU bound work. Amdahl's Law limits speedup: 5% serial work caps maximum speedup at 20× regardless of core count.
WhatsApp demonstrates extreme concurrency: 2 million TCP connections per server using lightweight processes with kilobytes of memory overhead versus megabytes for OS threads.
Production systems combine both patterns. Google search decomposes one query into 100+ parallel shard requests (parallelism), each server handling thousands of concurrent queries (concurrency).
Resource costs differ dramatically. OS threads reserve 0.5 to 2 MB stack space each; 100,000 threads consume 50 to 200 GB address space. Lightweight fibers start with 2 to 8 KB, enabling 1 million concurrent tasks with single digit GB usage.
📌 Examples
Netflix Zuul 2 gateway shifted from thread per connection to event driven non blocking I/O, achieving 3× throughput per instance and lower tail latencies by avoiding thread pool saturation at high concurrency.
A 64 core video encoding server achieves 50 to 60× speedup through parallelism when serial fraction is below 2%, demonstrating near linear scaling for embarrassingly parallel CPU bound workloads.
← Back to Concurrency vs Parallelism Overview
Concurrency vs Parallelism: Core Distinction | Concurrency vs Parallelism - System Overflow