ML Model OptimizationModel Pruning (Structured vs Unstructured)Hard⏱️ ~3 min

Pruning Tooling and Practical Workflow

Definition
Pruning during training (also called dynamic or gradual pruning) integrates pruning into the training loop rather than treating it as a post-hoc optimization. Weights are progressively zeroed based on importance scores computed each epoch.

Gradual Pruning Schedule

Start training dense. After warmup (10-20% of training), begin zeroing weights at each epoch. Use a cubic schedule: prune slowly at first, accelerate in the middle, slow down near target. For 90% final sparsity over 100 epochs: epoch 20 is 10% sparse, epoch 50 is 60% sparse, epoch 80 is 85% sparse. The cubic shape matches how accuracy recovers: the network adapts quickly to small changes, needs time to adjust to large changes.

Framework Support

PyTorch: torch.nn.utils.prune provides magnitude pruning utilities. Combine with custom training loops for gradual schedules. TensorFlow: tensorflow_model_optimization toolkit includes PolynomialDecay pruning schedules. Both support structured and unstructured. For production, export pruned models to ONNX and apply framework-agnostic optimizations. Key check: verify sparse representation after export since some formats densify for compatibility.

Validation Protocol

Compare pruned model against baseline on: accuracy (within 1-2%), latency on target hardware (actual speedup, not theoretical), memory footprint (both disk and runtime). Test on edge cases: inputs that originally scored near decision boundaries often degrade first. A/B test in production before full rollout since aggregate accuracy may hide per-segment degradation.

💡 Key Takeaways
Gradual pruning during training outperforms post-hoc pruning by allowing continuous adaptation
Cubic schedule: slow start, accelerate mid-training, slow finish; matches accuracy recovery dynamics
PyTorch uses torch.nn.utils.prune; TensorFlow uses tensorflow_model_optimization toolkit
Always verify sparse representation after ONNX export - some formats densify for compatibility
Test pruned models on edge cases near decision boundaries; they degrade first
📌 Interview Tips
1Describe cubic pruning schedule with specific milestones (20/50/80% of training = 10/60/85% sparsity)
2Mention verifying sparsity after ONNX export - a common production gotcha
3Recommend A/B testing before full rollout to catch segment-specific degradation
← Back to Model Pruning (Structured vs Unstructured) Overview
Pruning Tooling and Practical Workflow | Model Pruning (Structured vs Unstructured) - System Overflow