Failure Modes and Edge Cases in Production Drift Detection
FALSE POSITIVES: EXPECTED VARIATION
The most common failure: alerting on normal variation. Daily patterns, weekly cycles, seasonal effects, and random sampling noise can trigger drift alerts even when nothing is wrong.
Mitigation: establish baseline variability. Track drift metrics over time. Set thresholds based on historical percentiles (e.g., alert only when drift exceeds 99th percentile of historical values). Account for known patterns (weekends, holidays) in baseline.
FALSE NEGATIVES: MISSED DRIFT
Segment-level drift: Aggregate drift metrics may be stable while specific segments drift significantly. A user segment comprising 5% of traffic could drift 10x normal levels without moving aggregate metrics.
Feature interaction drift: Individual features may be stable, but their joint distribution changes. Feature A and Feature B both stable individually, but their correlation shifts. Most drift detection misses this.
Mitigation: monitor segment-level metrics for high-priority segments. For feature interactions, monitor prediction distribution (captures joint effects) alongside individual features.
DATA QUALITY MASQUERADING AS DRIFT
Upstream pipeline failures can look like drift. A feature that suddenly becomes all zeros is not drift—it is a bug. A feature with missing values filled incorrectly changes distribution without real-world change.
Distinguish drift from bugs: check data quality metrics (null rates, cardinality, value ranges) before investigating drift. A sudden spike in nulls is a pipeline issue, not drift.
THRESHOLD SENSITIVITY
Thresholds that are too tight create alert fatigue. Thresholds too loose miss real drift. There is no universal right threshold—it depends on feature stability, business impact of drift, and tolerance for false alarms.