Detecting Model Drift: Data, Concept, and Semantic Shifts
Three Types of Drift
Drift is the degradation of model behavior when live data distributions, input output mappings, or external context change over time. It manifests in three layers. Data drift occurs when feature distributions shift, detected via PSI, KL divergence, or KS tests on features and embeddings. A PSI greater than 0.2 to 0.3 warrants investigation, while PSI greater than 0.4 signals high risk. Concept drift means the underlying input output relationship changes, even if input distributions stay stable. Semantic drift in LLMs includes style shifts, refusal rate changes, hallucinations, or compliance failures, often triggered by vendor model updates.
Layered Detection Approach
Production drift detection uses a layered approach combining statistical signals, model side signals, semantic alignment checks, and product metrics. Monitor perplexity or loss on a fixed versioned evaluation set; rising values beyond a control band indicate model or data drift. For LLMs, track embedding similarity between outputs and retrieved sources to measure groundedness. Product metrics like CTR, CSAT, deflection rates, and re-ask rates provide business level signals.
Sensitivity vs Alert Fatigue
Low PSI thresholds catch issues early but page teams on benign seasonal shifts. Use multi window detectors that compare short term windows (1 day) to medium term baselines (7 to 14 days) and long term seasonal patterns (same weekday last month). Composite triggers requiring both statistical drift (PSI greater than 0.3) and quality degradation (perplexity up 10 percent) reduce noise.
Silent Drift Failures
Stale retrieval indexes or null inflation in feature pipelines create groundedness loss; the LLM fills in gaps with hallucinations that pass validation but fail fact checking. Vendor model updates change tokenization, refusal policies, or decoding defaults, causing cost or style shifts that break downstream user experience. Over aggressive log sampling removes the tail events needed to debug drift.