Learn→Model Monitoring & Observability→Model Performance Degradation & Alerting→2 of 6

Model Monitoring & Observability • Model Performance Degradation & AlertingMedium⏱️ ~2 min

Two Tier Monitoring: Service Health vs Model Quality

SERVICE HEALTH TIER
The first monitoring tier tracks infrastructure health: Is the model serving predictions? At what latency? Are there errors?
Key metrics: Request rate (QPS), latency (p50, p95, p99), error rate (5xx responses, timeout rate), throughput (successful predictions per second).
Thresholds: Error rate > 1% for 5 minutes triggers alert. P99 latency > 200ms (or your SLO) triggers investigation. These are standard SRE metrics applied to ML systems.
Service health alerts require immediate response. A model returning errors is worse than a model returning slightly wrong predictions. Prioritize availability first.
MODEL QUALITY TIER
The second tier tracks prediction quality: Are the predictions accurate? Is the model degrading?
Key metrics: Accuracy, precision, recall, AUC, NDCG—whatever metrics you optimized during training. Business proxy metrics: CTR, conversion rate, revenue impact.
Challenge: Quality metrics require ground truth labels, which arrive with delay. A fraud label might take 30 days. During that time, you do not know true model performance.
Workarounds: Use proxy metrics that arrive faster (clicks vs conversions). Monitor prediction distribution shifts. Compare model predictions to rule-based baselines.
INTEGRATION
Both tiers feed into a unified alerting system. Service health alerts page on-call immediately. Quality alerts may have lower urgency but still require investigation within hours.
Dashboard layout: service health on left (green/red status), quality metrics on right (trend lines over time). One glance shows system status.
✅ Best Practice: Service health is non-negotiable—monitor from day one. Model quality monitoring can start simpler and mature over time.

💡 Key Takeaways

✓Service health: QPS, latency (p50/p95/p99), error rate, throughput; requires immediate response when degraded

✓Model quality: accuracy, precision, recall, business metrics; requires labels (often delayed) or proxy metrics

✓Integrate both tiers: service health alerts page immediately; quality alerts require investigation within hours

📌 Interview Tips

1Interview Tip: Explain why service health is monitored separately—availability before accuracy.

2Interview Tip: Describe label delay workarounds: proxy metrics, prediction distribution monitoring, rule baselines.

← Back to Model Performance Degradation & Alerting Overview