Parallelism Limits: Amdahl's Law and Coordination Costs
Amdahl's Law Explained
Amdahl's Law calculates maximum speedup from parallelization. If 90% of work can be parallelized and 10% must remain serial, infinite cores yield only 10x speedup. The serial portion becomes the bottleneck. Formula: Speedup = 1 / (S + P/N) where S is serial fraction, P is parallel fraction, N is cores.
With 4 cores and 75% parallelizable work: 1 / (0.25 + 0.75/4) = 2.3x speedup. Not 4x because 25% remains serial. At 16 cores: 1 / (0.25 + 0.75/16) = 3.4x. Diminishing returns set in quickly. Adding cores past this point wastes money.
Coordination Overhead
Real systems perform worse than Amdahl predicts because parallelism has costs. Threads must synchronize access to shared data. Locks serialize access, reducing actual parallelism. Lock contention increases with thread count. More threads can mean more waiting.
Cache coherency adds hidden costs. When one core writes data, other cores must invalidate their cached copies. The data must travel between cores. This takes 50 to 100 nanoseconds per cache line. With many cores contending for the same data, this overhead dominates actual computation.
Finding The Sweet Spot
Profile before parallelizing. Identify the actual serial fraction through measurement, not estimation. Common serial bottlenecks: initialization, result aggregation, shared state updates, I/O serialization points.
Test with varying thread counts. Throughput often peaks before core count. A 16 core machine might peak at 8 threads due to lock contention or memory bandwidth saturation. Graph throughput versus thread count to find the optimal operating point. Going beyond it wastes resources and can reduce performance.