Baseline Selection Strategies and Trade-offs
TRAINING DATA BASELINE
Compare current predictions to prediction distribution on training data. This answers: are predictions different from what the model produced during training?
Advantages: Detects deviation from the model known-good state. Training distribution represents what the model was designed to output.
Disadvantages: Training data may be old. Some drift from training is expected and healthy. May generate false alarms as legitimate population changes occur.
Best for: detecting major deviations, initial deployment monitoring, regulatory contexts where you need to prove model is behaving as validated.
RECENT PRODUCTION BASELINE
Compare current predictions to recent production predictions (e.g., last 7 days). This answers: did predictions change recently?
Advantages: Detects sudden changes. Adapts to gradual evolution. Less sensitive to expected variation.
Disadvantages: May miss gradual drift that evolves slowly over time. If the world changes slowly, rolling baseline changes with it, hiding drift from training.
Best for: detecting sudden changes, operational alerting, stable production environments.
MULTI-BASELINE APPROACH
Use both baselines together. Alert when predictions drift from training baseline (long-term deviation) OR from recent baseline (sudden change).
Implementation: Maintain two comparison sets. Training baseline is static (updated only on model retrain). Recent baseline updates daily or weekly. Run drift detection against both.
Alert logic: Training drift without recent drift = gradual evolution, may be acceptable. Recent drift without training drift = temporary fluctuation, investigate but may resolve. Both = something significant changed.