Model Monitoring & ObservabilityConcept Drift & Model DecayMedium⏱️ ~2 min

Mitigation: Data Weighting, Retraining Cadence, and Model Portfolios

The core mitigation strategy is time aware data weighting combined with context appropriate retraining cadence. Apply exponential decay to emphasize recent examples: weight = 0.5 raised to the power of (age divided by half-life). Choose half-life by backtesting domain velocity. Fast shifting domains like ads and recommendations use 7 to 14 day half-lives, fraud during attack waves uses 1 to 7 days, while static vision models for industrial sensors use months. At Google and Meta, CTR models weight recent impressions more heavily with 3 to 14 day half-lives and retrain multiple times per day to capture trend shifts. Retraining cadence varies by label latency and drift velocity. Ads and recommendations retrain several times daily because click labels arrive within minutes and user preferences shift constantly. Uber ETA models recalibrate hourly and retrain daily as traffic patterns change. Stripe fraud models retrain hourly during attack spikes but daily otherwise. Vision or Natural Language Processing (NLP) models with static domains retrain weekly to monthly. The key tradeoff is reactivity versus stability: frequent updates reduce decay but increase variance, regressions, and operational cost running training infrastructure. Model portfolios handle recurring drift patterns. Maintain specialized models for known contexts like weekday versus weekend behavior, holidays, or regions. Netflix switches to context specific parameters for weekend family viewing patterns. Use a lightweight context classifier to route traffic or interpolate predictions. Store historical models with metadata so the system can resurrect the right one when seasonal patterns recur. Combine a short horizon expert trained on latest data (fast adaptation) with a long horizon expert capturing slow moving signals (stability). Ensemble or gating mechanisms let the system adapt without overreacting to noise.
💡 Key Takeaways
Exponential decay formula: weight = 0.5^(age / half-life). Ads use 7 to 14 days, fraud during attacks uses 1 to 7 days, static vision uses months. This smoothly shifts emphasis without abrupt cutoffs.
Retraining cadence by domain: Ads and recommendations retrain multiple times daily (click labels in minutes), Uber ETA hourly recalibration plus daily retraining, fraud hourly during attacks, vision weekly to monthly.
Include 5 to 20% rehearsal examples from older data in online or mini-batch updates to prevent catastrophic forgetting. Maintain balanced buffers of hard negatives and rare positives.
Model portfolios for recurring patterns: Store weekday versus weekend models, holiday models, regional models. Netflix resurrects context specific models when seasonal viewing patterns recur.
Short horizon plus long horizon ensemble: Short horizon model (3 to 7 days of data) adapts fast, long horizon model (30 to 90 days) provides stability. Gate or blend based on drift signals.
Reactivity versus stability tradeoff: Frequent retraining (hourly) costs more compute, increases variance, and risks regressions but adapts fast. Longer intervals (daily or weekly) are stable but slow to react.
📌 Examples
Meta CTR models: Retrain 3 to 6 times per day with 7 day half-life weighting. Delayed feedback correction jobs run nightly to recalibrate probability estimates when 24 to 72 hour conversion labels arrive. Cap daily calibration shifts to prevent volatility in bid prices.
Netflix personalization: Candidate generation refreshes offline daily, online re-ranking adapts within a session. After content releases, increase exploration rates for 30 to 120 minutes, then update calibration and item priors. Maintain separate weekend family viewing model activated Friday evenings.
Stripe fraud: Short horizon sentinel model trains on last 24 hours of data with aggressive weighting, retrains hourly during attacks. Long horizon baseline model uses 90 days with 30 day half-life, retrains daily. Ensemble gates blend predictions based on recent drift magnitude.
← Back to Concept Drift & Model Decay Overview