Learn→Fraud Detection & Anomaly Detection→Supervised Anomaly Detection (Imbalanced Classification)→2 of 6

Fraud Detection & Anomaly Detection • Supervised Anomaly Detection (Imbalanced Classification)Medium⏱️ ~3 min

Training Strategies for Extreme Class Imbalance: Resampling vs Weighting

The Training Problem
Standard training minimizes overall error. With 1000:1 class ratio, the model sees 1000 normal examples for every fraud example. Gradient updates from the majority class dominate. The model learns to predict "normal" because that minimizes loss on 99.9% of training data.
Two main strategies fix this: change the data distribution (resampling) or change how errors are weighted (class weighting).
Resampling Strategies
Undersampling: Remove majority examples until balanced. 1M normal + 1K fraud becomes 1K + 1K. Fast training but discards 99.9% of normal data, losing patterns.
Oversampling: Duplicate minority examples. 1M + 1K becomes 1M + 1M. Risk: model memorizes duplicates instead of learning patterns.
SMOTE: Creates synthetic minority examples by interpolating between existing ones. For each fraud, find 5 nearest fraud neighbors, create a new example between them. Reduces memorization but can create unrealistic examples.
Class Weighting
Instead of changing data, change the loss function. Multiply minority class loss by the imbalance ratio. If fraud is 0.1% of data, multiply fraud losses by 1000. One misclassified fraud hurts as much as 1000 misclassified normal.
💡 Weight Formula: weight = total / (num_classes × class_count). For 1M total with 1K fraud: fraud weight = 1M / (2 × 1K) = 500.
When to Use Each
Class weighting: Simpler, preserves all data. Use as default for most problems.
Undersampling: When data is huge and speed matters. Accept information loss.
SMOTE: When minority class is under 100 examples and you need diversity. Validate that synthetic examples are realistic.

💡 Key Takeaways

✓Standard training ignores minority class because majority dominates gradient updates

✓Undersampling is fast but discards 99% of data; oversampling risks memorization

✓SMOTE creates synthetic examples by interpolating between minority class neighbors

✓Class weighting multiplies minority loss by imbalance ratio: weight = total / (2 × class_count)

✓Use class weighting as default; SMOTE when minority class is under 100 examples

📌 Interview Tips

1Explain class weighting formula: weight = total / (num_classes × class_count)

2Describe SMOTE: interpolate between k nearest neighbors to create synthetic examples

3Compare strategies: weighting preserves all data, undersampling is fast, SMOTE adds diversity

← Back to Supervised Anomaly Detection (Imbalanced Classification) Overview