What is Concept Drift vs Data Drift vs Model Decay?
DATA DRIFT: INPUT DISTRIBUTION SHIFTS
Data drift occurs when the statistical properties of input features change over time. The model was trained on one distribution but now receives data from a different distribution. Examples: user demographics shift, new product categories emerge, or seasonal patterns change.
Data drift can occur without concept drift. If feature distributions shift but the relationship between features and outcomes remains stable, a well-generalized model may continue to perform well. However, most models overfit to training distributions and degrade when inputs shift.
CONCEPT DRIFT: THE RULES CHANGE
Concept drift occurs when the underlying relationship between inputs and outputs changes. The mapping P(Y|X) shifts. A fraud model trained when fraudsters used method A becomes less effective when they switch to method B—the input features might look similar, but they now indicate different outcomes.
Concept drift is harder to detect than data drift because you cannot measure it directly from inputs alone. You need labeled outcomes to observe that predictions no longer match reality, and labels often arrive with significant delay.
MODEL DECAY: THE OBSERVABLE SYMPTOM
Model decay is performance degradation over time. It is the symptom, not the cause. Decay might result from data drift, concept drift, or both. Tracking decay metrics (accuracy, AUC, business metrics) tells you something is wrong but not what.
Typical decay timeline: models degrade 1-5% per month without intervention. High-velocity domains (fraud, trending content) decay faster. Stable domains (document classification) decay slower.