Prediction Drift Failure Modes and Mitigation
FALSE ALARMS FROM EXPECTED VARIATION
The most common failure: alerting on normal variation. Prediction distributions fluctuate naturally due to traffic patterns, seasonality, and random sampling. Without accounting for expected variation, you get alert fatigue.
Detection: Track drift metrics over time. Establish historical percentiles. A drift value that would be 90th percentile historically is not alarming; 99th percentile is.
Mitigation: Set thresholds based on historical variability, not fixed values. Require drift to persist across multiple time windows before alerting. Use seasonally-adjusted baselines.
DRIFT WITHOUT PERFORMANCE IMPACT
Prediction distribution can shift without affecting model performance. If ground truth also shifts proportionally, accuracy remains stable despite prediction drift.
Example: Fraud rate increases from 1% to 2% in reality. Model predictions shift to predict more fraud. Prediction drift detected. But accuracy is unchanged because the model correctly reflects the new reality.
Response: Investigate but do not automatically assume problem. Cross-check with performance metrics when labels arrive. If performance is stable, drift may be acceptable.
MISSED DRIFT DUE TO OFFSETTING CHANGES
Different segments may drift in opposite directions, canceling out in aggregate. Segment A predictions increase while Segment B predictions decrease. Aggregate looks stable but both segments changed significantly.
Detection: Monitor slice-level drift, not just aggregate. Even if aggregate is stable, alert if any high-priority slice shows significant drift.
BASELINE STALENESS
If baseline becomes too old, drift detection becomes meaningless. Everything looks different from a 6-month-old baseline.