Production Failure Modes and Mitigation Strategies
Label Drift
User language evolves. "Cancel my account" becomes "I want to churn" becomes "unsubscribe me." Your model trained on 2023 data; users in 2025 use different phrases. Accuracy drops 1-2% per quarter without intervention.
Detection: Monitor prediction confidence distribution. If the model becomes less confident on average, text patterns are drifting from training data. Track per-class accuracy weekly using sampled manual reviews.
Mitigation: Retrain quarterly with recent data. Sample 500 low-confidence predictions per week for manual labeling and add to training set.
Adversarial Inputs
Users learn to game classifiers. If "refund" triggers a refund workflow, users add "refund" to unrelated requests hoping for faster routing. Spam gets smarter: misspellings like "fr33 m0ney" bypass keyword filters.
Detection: Look for sudden spikes in specific category predictions. If "refund" requests jump 40% without a product issue, users may be gaming the system.
Mitigation: Use semantic models instead of keyword matching. Add downstream validation: route "refund" requests to human if purchase history does not support the claim.
Training-Serving Skew
The preprocessing in training differs from production. Training lowercased text; production does not. Training removed emojis; production keeps them. Result: 10-15% accuracy drop invisible in offline evaluation.
Out of Distribution Inputs
Production sees text your training never covered. If you trained on English emails, the model has no useful behavior for Spanish, code snippets, or JSON payloads. It returns a random label with false confidence.
Mitigation: Add an explicit "unknown" class. Route low-confidence predictions (below 0.6) to human review.