Production Failure Modes and Defensive Strategies
SILENT DEGRADATION
The most dangerous failure mode: model performance degrades but no alerts fire. This happens when drift detection thresholds are too loose, when you are monitoring the wrong metrics, or when drift affects only specific segments that aggregate metrics mask.
Prevention: set tight thresholds and accept some false alerts. Monitor multiple metrics at multiple granularities. Track segment-level metrics, not just aggregates. Review drift dashboards regularly even when no alerts fire.
CATASTROPHIC FORGETTING
When retraining on recent data only, the model may forget patterns from older data that are still relevant. A model retrained during a holiday season may forget normal behavior and perform poorly when holidays end.
Prevention: maintain historical data in training. Use replay buffers that sample from all time periods. Weight recent data higher but do not exclude old data entirely.
FEEDBACK LOOPS
Model predictions affect user behavior, which generates training data, which trains the model. If the model is biased, it reinforces its own bias. Recommendation models can create filter bubbles. Fraud models can push fraudsters to new patterns that become harder to detect.
Detection: track diversity metrics. Are recommendations becoming more homogeneous? Are fraud patterns concentrating in specific categories? Breaking feedback loops requires exploration: reserve 5-10% of traffic for random recommendations to collect unbiased data.
FALSE DRIFT ALARMS
Not all detected drift is real. Temporary anomalies, data quality issues, or logging bugs can trigger false alarms. Retraining on bad data makes things worse.
Defense: verify drift before acting. Cross-check multiple drift signals. Investigate root cause before triggering retraining. Have human-in-the-loop for significant retraining decisions.