Production Failure Modes and Mitigation Strategies

Label Drift
User language evolves. "Cancel my account" becomes "I want to churn" becomes "unsubscribe me." Your model trained on 2023 data; users in 2025 use different phrases. Accuracy drops 1-2% per quarter without intervention.
Detection: Monitor prediction confidence distribution. If the model becomes less confident on average, text patterns are drifting from training data. Track per-class accuracy weekly using sampled manual reviews.
Mitigation: Retrain quarterly with recent data. Sample 500 low-confidence predictions per week for manual labeling and add to training set.
Adversarial Inputs
Users learn to game classifiers. If "refund" triggers a refund workflow, users add "refund" to unrelated requests hoping for faster routing. Spam gets smarter: misspellings like "fr33 m0ney" bypass keyword filters.
Detection: Look for sudden spikes in specific category predictions. If "refund" requests jump 40% without a product issue, users may be gaming the system.
Mitigation: Use semantic models instead of keyword matching. Add downstream validation: route "refund" requests to human if purchase history does not support the claim.
Training-Serving Skew
The preprocessing in training differs from production. Training lowercased text; production does not. Training removed emojis; production keeps them. Result: 10-15% accuracy drop invisible in offline evaluation.
⚠️ Critical: Use identical preprocessing code for training and serving. Package preprocessing as a shared library. Test with production samples before deployment.
Out of Distribution Inputs
Production sees text your training never covered. If you trained on English emails, the model has no useful behavior for Spanish, code snippets, or JSON payloads. It returns a random label with false confidence.
Mitigation: Add an explicit "unknown" class. Route low-confidence predictions (below 0.6) to human review.

💡 Key Takeaways

✓Label drift: user language evolves, accuracy drops 1-2% per quarter without retraining

✓Monitor prediction confidence distribution to detect drift early

✓Adversarial inputs: users game classifiers, use semantic models not keyword matching

✓Training-serving skew: different preprocessing causes 10-15% accuracy drop

✓Out of distribution: add unknown class, route low-confidence to human review

📌 Interview Tips

1Explain label drift detection: monitor confidence distribution and weekly accuracy

2For adversarial inputs, describe downstream validation with purchase history

3Emphasize identical preprocessing code for training and serving

← Back to Text Classification at Scale Overview