Class Weighting and Focal Loss: Reweighting the Loss Function
Class Weighting
Instead of modifying data, modify how errors are counted. Assign higher weight to minority class errors during loss computation. If fraud is 0.1% of data, weight fraud errors 1000x more than non-fraud errors. The model receives equal gradient signal from both classes despite the imbalance. Weight = total_samples / (num_classes × class_count).
Key Advantage: Class weighting requires no data modification. Apply it as a hyperparameter during training. Works with any loss function by multiplying the loss by class weight before backpropagation.
Focal Loss
Focal Loss down-weights easy examples and focuses training on hard examples. Standard cross-entropy treats all errors equally. Focal Loss adds a modulating factor: FL = -(1-p)^γ × log(p), where γ (gamma) controls focus strength. When the model is confident (p close to 1), the (1-p)^γ term approaches zero, contributing little to loss. Hard examples (p close to 0.5) contribute more.
Why Focal Loss Helps Imbalance
In imbalanced data, most majority class examples are easy—the model quickly learns to predict them correctly. These easy negatives dominate the loss, drowning out the harder minority examples. Focal Loss automatically reduces their contribution, effectively upweighting the minority class without explicit class weights.
Tuning Tip: Start with γ=2 for focal loss (the original paper default). Higher γ focuses more aggressively on hard examples but may cause instability. Combine with α (class weight) parameter for severely imbalanced data.
When to Use Each
Class weighting: simple, works with any model, good first approach. Focal loss: better when easy majority examples dominate, especially in neural networks. Both can be combined—focal loss for hard example focus plus class weights for explicit imbalance correction.