Pruning Tooling and Practical Workflow
Gradual Pruning Schedule
Start training dense. After warmup (10-20% of training), begin zeroing weights at each epoch. Use a cubic schedule: prune slowly at first, accelerate in the middle, slow down near target. For 90% final sparsity over 100 epochs: epoch 20 is 10% sparse, epoch 50 is 60% sparse, epoch 80 is 85% sparse. The cubic shape matches how accuracy recovers: the network adapts quickly to small changes, needs time to adjust to large changes.
Framework Support
PyTorch: torch.nn.utils.prune provides magnitude pruning utilities. Combine with custom training loops for gradual schedules. TensorFlow: tensorflow_model_optimization toolkit includes PolynomialDecay pruning schedules. Both support structured and unstructured. For production, export pruned models to ONNX and apply framework-agnostic optimizations. Key check: verify sparse representation after export since some formats densify for compatibility.
Validation Protocol
Compare pruned model against baseline on: accuracy (within 1-2%), latency on target hardware (actual speedup, not theoretical), memory footprint (both disk and runtime). Test on edge cases: inputs that originally scored near decision boundaries often degrade first. A/B test in production before full rollout since aggregate accuracy may hide per-segment degradation.