Natural Language Processing SystemsText Classification at ScaleHard⏱️ ~3 min

Production Failure Modes and Mitigation Strategies

Label Drift

User language evolves. "Cancel my account" becomes "I want to churn" becomes "unsubscribe me." Your model trained on 2023 data; users in 2025 use different phrases. Accuracy drops 1-2% per quarter without intervention.

Detection: Monitor prediction confidence distribution. If the model becomes less confident on average, text patterns are drifting from training data. Track per-class accuracy weekly using sampled manual reviews.

Mitigation: Retrain quarterly with recent data. Sample 500 low-confidence predictions per week for manual labeling and add to training set.

Adversarial Inputs

Users learn to game classifiers. If "refund" triggers a refund workflow, users add "refund" to unrelated requests hoping for faster routing. Spam gets smarter: misspellings like "fr33 m0ney" bypass keyword filters.

Detection: Look for sudden spikes in specific category predictions. If "refund" requests jump 40% without a product issue, users may be gaming the system.

Mitigation: Use semantic models instead of keyword matching. Add downstream validation: route "refund" requests to human if purchase history does not support the claim.

Training-Serving Skew

The preprocessing in training differs from production. Training lowercased text; production does not. Training removed emojis; production keeps them. Result: 10-15% accuracy drop invisible in offline evaluation.

⚠️ Critical: Use identical preprocessing code for training and serving. Package preprocessing as a shared library. Test with production samples before deployment.

Out of Distribution Inputs

Production sees text your training never covered. If you trained on English emails, the model has no useful behavior for Spanish, code snippets, or JSON payloads. It returns a random label with false confidence.

Mitigation: Add an explicit "unknown" class. Route low-confidence predictions (below 0.6) to human review.

💡 Key Takeaways
Label drift: user language evolves, accuracy drops 1-2% per quarter without retraining
Monitor prediction confidence distribution to detect drift early
Adversarial inputs: users game classifiers, use semantic models not keyword matching
Training-serving skew: different preprocessing causes 10-15% accuracy drop
Out of distribution: add unknown class, route low-confidence to human review
📌 Interview Tips
1Explain label drift detection: monitor confidence distribution and weekly accuracy
2For adversarial inputs, describe downstream validation with purchase history
3Emphasize identical preprocessing code for training and serving
← Back to Text Classification at Scale Overview