Pruning Tooling and Practical Workflow

Definition
Pruning during training (also called dynamic or gradual pruning) integrates pruning into the training loop rather than treating it as a post-hoc optimization. Weights are progressively zeroed based on importance scores computed each epoch.
Gradual Pruning Schedule
Start training dense. After warmup (10-20% of training), begin zeroing weights at each epoch. Use a cubic schedule: prune slowly at first, accelerate in the middle, slow down near target. For 90% final sparsity over 100 epochs: epoch 20 is 10% sparse, epoch 50 is 60% sparse, epoch 80 is 85% sparse. The cubic shape matches how accuracy recovers: the network adapts quickly to small changes, needs time to adjust to large changes.
Framework Support
PyTorch: torch.nn.utils.prune provides magnitude pruning utilities. Combine with custom training loops for gradual schedules. TensorFlow: tensorflow_model_optimization toolkit includes PolynomialDecay pruning schedules. Both support structured and unstructured. For production, export pruned models to ONNX and apply framework-agnostic optimizations. Key check: verify sparse representation after export since some formats densify for compatibility.
Validation Protocol
Compare pruned model against baseline on: accuracy (within 1-2%), latency on target hardware (actual speedup, not theoretical), memory footprint (both disk and runtime). Test on edge cases: inputs that originally scored near decision boundaries often degrade first. A/B test in production before full rollout since aggregate accuracy may hide per-segment degradation.

💡 Key Takeaways

✓Gradual pruning during training outperforms post-hoc pruning by allowing continuous adaptation

✓Cubic schedule: slow start, accelerate mid-training, slow finish; matches accuracy recovery dynamics

✓PyTorch uses torch.nn.utils.prune; TensorFlow uses tensorflow_model_optimization toolkit

✓Always verify sparse representation after ONNX export - some formats densify for compatibility

✓Test pruned models on edge cases near decision boundaries; they degrade first

📌 Interview Tips

1Describe cubic pruning schedule with specific milestones (20/50/80% of training = 10/60/85% sparsity)

2Mention verifying sparsity after ONNX export - a common production gotcha

3Recommend A/B testing before full rollout to catch segment-specific degradation

← Back to Model Pruning (Structured vs Unstructured) Overview