Two Tier Monitoring: Service Health vs Model Quality
SERVICE HEALTH TIER
The first monitoring tier tracks infrastructure health: Is the model serving predictions? At what latency? Are there errors?
Key metrics: Request rate (QPS), latency (p50, p95, p99), error rate (5xx responses, timeout rate), throughput (successful predictions per second).
Thresholds: Error rate > 1% for 5 minutes triggers alert. P99 latency > 200ms (or your SLO) triggers investigation. These are standard SRE metrics applied to ML systems.
Service health alerts require immediate response. A model returning errors is worse than a model returning slightly wrong predictions. Prioritize availability first.
MODEL QUALITY TIER
The second tier tracks prediction quality: Are the predictions accurate? Is the model degrading?
Key metrics: Accuracy, precision, recall, AUC, NDCG—whatever metrics you optimized during training. Business proxy metrics: CTR, conversion rate, revenue impact.
Challenge: Quality metrics require ground truth labels, which arrive with delay. A fraud label might take 30 days. During that time, you do not know true model performance.
Workarounds: Use proxy metrics that arrive faster (clicks vs conversions). Monitor prediction distribution shifts. Compare model predictions to rule-based baselines.
INTEGRATION
Both tiers feed into a unified alerting system. Service health alerts page on-call immediately. Quality alerts may have lower urgency but still require investigation within hours.
Dashboard layout: service health on left (green/red status), quality metrics on right (trend lines over time). One glance shows system status.