Model Monitoring & ObservabilityModel Performance Degradation & AlertingMedium⏱️ ~2 min

Two Tier Monitoring: Service Health vs Model Quality

SERVICE HEALTH TIER

The first monitoring tier tracks infrastructure health: Is the model serving predictions? At what latency? Are there errors?

Key metrics: Request rate (QPS), latency (p50, p95, p99), error rate (5xx responses, timeout rate), throughput (successful predictions per second).

Thresholds: Error rate > 1% for 5 minutes triggers alert. P99 latency > 200ms (or your SLO) triggers investigation. These are standard SRE metrics applied to ML systems.

Service health alerts require immediate response. A model returning errors is worse than a model returning slightly wrong predictions. Prioritize availability first.

MODEL QUALITY TIER

The second tier tracks prediction quality: Are the predictions accurate? Is the model degrading?

Key metrics: Accuracy, precision, recall, AUC, NDCG—whatever metrics you optimized during training. Business proxy metrics: CTR, conversion rate, revenue impact.

Challenge: Quality metrics require ground truth labels, which arrive with delay. A fraud label might take 30 days. During that time, you do not know true model performance.

Workarounds: Use proxy metrics that arrive faster (clicks vs conversions). Monitor prediction distribution shifts. Compare model predictions to rule-based baselines.

INTEGRATION

Both tiers feed into a unified alerting system. Service health alerts page on-call immediately. Quality alerts may have lower urgency but still require investigation within hours.

Dashboard layout: service health on left (green/red status), quality metrics on right (trend lines over time). One glance shows system status.

✅ Best Practice: Service health is non-negotiable—monitor from day one. Model quality monitoring can start simpler and mature over time.
💡 Key Takeaways
Service health: QPS, latency (p50/p95/p99), error rate, throughput; requires immediate response when degraded
Model quality: accuracy, precision, recall, business metrics; requires labels (often delayed) or proxy metrics
Integrate both tiers: service health alerts page immediately; quality alerts require investigation within hours
📌 Interview Tips
1Interview Tip: Explain why service health is monitored separately—availability before accuracy.
2Interview Tip: Describe label delay workarounds: proxy metrics, prediction distribution monitoring, rule baselines.
← Back to Model Performance Degradation & Alerting Overview