Production Trade-offs: When to Use Each Technique
Decision Framework
Choose based on data characteristics and operational constraints. SMOTE works well for tabular data with continuous features where interpolation makes sense. Class weighting is the simplest approach—try it first. Focal loss excels when easy examples dominate gradients, common in deep learning.
Rule of Thumb: Start with class weighting (zero code change, just a hyperparameter). If performance is insufficient, try focal loss for neural networks or SMOTE for tree-based models. Rarely need to combine all techniques.
SMOTE Trade-offs
Advantages: generates novel training examples, expands minority class feature space. Disadvantages: increases training set size (slower training), assumes interpolation validity, may generate unrealistic samples near class boundaries. Best for: tabular data, moderate imbalance (1:10 to 1:100), when more data would genuinely help.
Class Weighting Trade-offs
Advantages: no data modification, simple to implement, works with any model. Disadvantages: extreme weights can cause training instability, does not add information (just reweights existing samples). Best for: initial baseline, when minority examples are representative, with tree-based models.
Focal Loss Trade-offs
Advantages: adaptive to example difficulty, no data modification. Disadvantages: adds hyperparameter to tune, only works with differentiable models. Best for: neural networks, object detection, when many easy negatives dominate training.
Production Insight: Undersampling the majority class often works as well as oversampling the minority—and trains faster. Try training on a balanced subset before adding complexity.