What is Data Drift Detection?

Definition
Data drift detection monitors changes in the statistical properties of input features over time, alerting you when production data differs significantly from training data.
WHY DATA DRIFT MATTERS
ML models learn patterns from training data. When production data distributions differ from training distributions, predictions become unreliable. A model trained on users aged 25-45 will perform poorly when user demographics shift to include teenagers—even if the underlying relationship between features and outcomes has not changed.
Data drift is detectable without labels. Unlike concept drift (which requires labels to observe), you can measure data drift from inputs alone. This makes it valuable for early warning before performance degradation becomes visible in downstream metrics.
TYPES OF DATA DRIFT
Covariate shift: Feature distributions change but the relationship between features and labels remains stable. Most common type. Example: age distribution shifts but age still predicts outcomes the same way.
Prior probability shift: The distribution of labels changes. Example: fraud rate increases from 0.1% to 1%. Feature distributions may be unchanged, but class balance shifts.
Feature schema drift: Features themselves change—new categories appear, features are renamed, or data types change. Often indicates upstream pipeline issues rather than genuine distribution shift.
DRIFT VS NOISE
Not all distribution changes are drift. Daily and weekly patterns, seasonal effects, and random sampling variation cause temporary distribution changes that are not drift. Drift is a sustained shift that affects model reliability.
💡 Key Insight: Data drift detection is an early warning system. It signals potential problems before labels arrive and performance metrics drop.

💡 Key Takeaways

✓Data drift: input distributions change from training to production; detectable without labels (unlike concept drift)

✓Types: covariate shift (features change), prior probability shift (label balance changes), schema drift (feature definitions change)

✓Distinguish drift from noise: daily/weekly patterns and sampling variation are not drift; drift is sustained shift

📌 Interview Tips

1Interview Tip: Explain why data drift detection is valuable—early warning before performance metrics degrade.

2Interview Tip: Give examples of each drift type: age distribution shift, fraud rate increase, new category appearing.

← Back to Data Drift Detection Overview