Production Implementation: Iterative Pruning and Fine Tuning
The Iterative Pipeline
Start with trained model. In each iteration: compute importance scores, remove lowest 10-20%, fine-tune for 2-5 epochs. Repeat until target sparsity. For 90% sparsity with 10% per iteration, expect 10 pruning rounds. Total fine-tuning time: 20-50 epochs spread across iterations. This sounds expensive but consistently achieves 1-2% better final accuracy than one-shot approaches at the same sparsity.
Layer Sensitivity
Not all layers tolerate pruning equally. Early layers (extracting basic features like edges) and final layers (making predictions) are most sensitive. Middle layers can often be pruned more aggressively. Common pattern: prune 30% from first layer, 60-70% from middle layers, 40% from last layer. Uniform pruning (same ratio everywhere) typically underperforms by 2-3% accuracy.
Lottery Ticket Hypothesis
A pruned network contains a "winning ticket": a subnetwork that, if trained from scratch with original initialization, matches full network accuracy. This suggests optimal sparse structures exist from initialization. Practical implication: you can prune once, reset to initial weights, retrain the sparse network. This sometimes outperforms prune-then-fine-tune but requires storing original weights.