Detection Strategies: Monitoring Drift with Statistical Signals
DATA DRIFT DETECTION
Monitor input feature distributions over time. Compare current distribution to a reference (training data or recent production window). Statistical tests quantify divergence.
Population Stability Index (PSI): Compares two distributions by binning values and measuring shift. PSI < 0.1 indicates negligible drift. PSI 0.1-0.25 indicates moderate drift. PSI > 0.25 indicates significant drift requiring investigation.
Kolmogorov-Smirnov test: Measures maximum distance between cumulative distributions. P-value < 0.05 suggests statistically significant drift. Works well for continuous features.
Chi-squared test: For categorical features. Compares observed category frequencies against expected baseline. Sensitive to sample size—large samples detect even tiny shifts.
PREDICTION DRIFT DETECTION
Even without labels, you can monitor model outputs. If the distribution of predictions shifts significantly, something changed—either inputs drifted or the model changed.
Track: prediction mean, standard deviation, percentiles (p10, p50, p90). Sudden shifts in these statistics indicate drift. Gradual shifts over weeks may indicate concept drift.
PERFORMANCE DRIFT (REQUIRES LABELS)
The most reliable signal but often delayed. Monitor accuracy, precision, recall, AUC on labeled data as labels arrive. Compare to baseline performance on training/validation data.
Challenge: labels arrive with delay (fraud labels take 30+ days, conversion labels take 7+ days). By the time you detect performance drift, the model has been underperforming for weeks.
Workaround: use early proxy metrics (click-through rate, engagement) that arrive faster than final labels. Proxy drift often precedes label drift.