Mitigation: Data Weighting, Retraining Cadence, and Model Portfolios
RETRAINING STRATEGIES
Scheduled retraining: Retrain on a fixed cadence (daily, weekly, monthly). Simple to implement. Downside: retrains even when unnecessary and may miss rapid drift between cycles.
Triggered retraining: Retrain when drift metrics exceed thresholds. More efficient but requires reliable drift detection. Risk: false triggers cause unnecessary retraining; missed triggers allow decay.
Continuous training: Always training on streaming data. Model constantly adapts. Complex infrastructure. Risk of catastrophic forgetting if training data has temporary anomalies.
DATA WEIGHTING
Recent data is more relevant than old data for many domains. Weight training examples by recency: exponential decay assigns higher weight to recent samples. weight = e^(-λ × age_days)
Tune λ based on domain volatility. High λ (short half-life) for rapidly changing domains (trending content). Low λ for stable domains (document classification).
Alternatively, use sliding windows: train only on last N days of data. Simpler than exponential weighting but creates hard cutoffs that may lose valuable historical patterns.
MODEL ENSEMBLES AND PORTFOLIOS
Instead of one model, maintain multiple models trained on different time windows. Ensemble predictions by weighted average. Recent-trained models capture current patterns; older models remember stable patterns.
Champion-challenger pattern: Current production model is champion. New models train continuously as challengers. When a challenger outperforms champion on validation, promote it. Provides both stability and adaptation.
Portfolio benefit: if drift is temporary (e.g., holiday spike), older models recover quickly when drift reverses. A single retrained model may have forgotten pre-drift patterns.