Structured vs Unstructured Pruning: Core Differences
Unstructured Pruning
Removes individual weights anywhere in the network, creating sparse weight matrices. A 90% pruned layer keeps only 10% of its original weights. The remaining weights are scattered unpredictably. Advantage: maximum flexibility means maximum compression. A 95% sparse network can match dense accuracy. Disadvantage: sparse matrices don"t run faster on standard hardware. A GPU still processes full matrix multiplications; zeros just produce zero outputs.
Structured Pruning
Removes entire neurons, channels, or layers rather than individual weights. Pruning a filter in a convolutional layer removes that filter entirely plus corresponding connections. The result is a smaller dense network, not a sparse one. Advantage: direct speedup on any hardware since matrix dimensions actually shrink. A network with 50% of channels pruned runs approximately 2x faster. Disadvantage: less flexible, harder to maintain accuracy at high compression ratios.
The Practical Gap
Unstructured pruning achieves 10-20x compression on paper but often no speedup without specialized sparse hardware. Structured pruning typically achieves 2-4x compression but delivers real speedups on GPUs and CPUs. Choose based on deployment target, not just compression ratio.