Learn→Model Monitoring & Observability→Model Performance Degradation & Alerting→6 of 6

Model Monitoring & Observability • Model Performance Degradation & AlertingHard⏱️ ~2 min

Failure Modes: Label Bias, Seasonality, and Slice Degradation

LABEL BIAS
Labels themselves can be biased, leading to misleading performance metrics. If human labelers are biased, or if the labeling process is inconsistent, measured accuracy does not reflect true accuracy.
Example: Fraud labels come from investigations. Investigators prioritize high-value transactions. Low-value fraud is under-investigated and under-labeled. Model accuracy on low-value transactions appears high but may be low in reality.
Detection: Track labeling patterns across segments. Are some segments labeled more completely? Compare label rates to expected rates from domain knowledge.
Mitigation: Use stratified evaluation. Sample transactions for manual review regardless of model prediction. This provides unbiased ground truth.
SEASONALITY EFFECTS
Performance naturally varies by season. Holiday shopping patterns differ from normal patterns. A model performing well in January may struggle in December due to seasonal shift, not degradation.
Detection: Compare current metrics to same-period-last-year, not just recent average. Use seasonal decomposition to separate trend from seasonality.
Response: Do not alert on expected seasonal variation. Set seasonally-adjusted thresholds. Retrain with recent seasonal data before high-stakes periods.
SLICE DEGRADATION
Aggregate metrics may be stable while specific segments degrade significantly. A model maintaining 90% overall accuracy might drop to 60% accuracy for a specific user segment representing 5% of traffic.
Detection: Track metrics per segment. Define critical segments (high-value users, key product categories, important geographies). Set per-segment thresholds.
Response: Investigate segment-specific issues. May need segment-specific models or additional training data for underperforming segments.
⚠️ Key Trade-off: More granular monitoring (per-segment) catches more issues but requires more compute and creates more alerts. Focus on segments with highest business impact.

💡 Key Takeaways

✓Label bias: biased labeling process misleads metrics; use stratified sampling for unbiased evaluation

✓Seasonality: performance varies by season; compare to same-period-last-year, use seasonally-adjusted thresholds

✓Slice degradation: aggregate metrics hide segment-specific problems; track metrics per critical segment

📌 Interview Tips

1Interview Tip: Give a label bias example: fraud investigators prioritize high-value transactions, under-labeling low-value fraud.

2Interview Tip: Explain slice degradation: 90% overall accuracy but 60% for a 5% segment—aggregate hides the problem.

← Back to Model Performance Degradation & Alerting Overview