Model Monitoring & ObservabilityBusiness Metrics CorrelationHard⏱️ ~3 min

Critical Failure Modes and Guardrails

SPURIOUS CORRELATIONS

The most dangerous failure mode: acting on correlations that are not causal. You observe that model latency correlates with revenue. You invest heavily in latency optimization. Revenue does not improve because the correlation was spurious—both were driven by traffic volume.

Detecting spurious correlations: look for plausible confounders. Run small A/B tests to validate causality before large investments. If a correlation appears suddenly, investigate what else changed.

TRANSFER FUNCTION DRIFT

Transfer functions change over time. Early in a product lifecycle, model improvements may have large business impact. As the product matures, impact diminishes (diminishing returns). A transfer function calibrated last year may overestimate current impact.

Detection: Track predicted vs actual business impact for each model change. If predictions consistently overestimate impact, your transfer functions are stale. Recalibrate quarterly using recent A/B test results.

SEGMENT DIVERGENCE

Aggregate correlations mask segment-level divergence. Overall correlation between AUC and revenue might be stable, but declining for your most valuable segment while increasing for low-value users. Acting on aggregate metrics optimizes for the wrong users.

Guardrail: monitor correlations by segment. Alert when any high-priority segment diverges significantly from aggregate trends.

METRIC GAMING

When teams are evaluated on metric correlations, they may optimize for correlation rather than business impact. A team might improve model metrics in ways that artificially inflate correlation without genuine business value.

Mitigation: evaluate teams on A/B test results, not correlation strength. Use holdout tests where the model team does not know which metrics will be measured. Rotate evaluation metrics to prevent gaming.

⚠️ Key Trade-off: Tight correlation monitoring can create perverse incentives. Balance metric tracking with holistic evaluation that includes A/B test impact and qualitative assessment.
💡 Key Takeaways
Spurious correlations lead to wasted investment—validate causality with A/B tests before acting
Transfer functions drift over time as products mature; recalibrate quarterly using recent experiments
Segment divergence hides problems; monitor high-priority segments separately from aggregates
📌 Interview Tips
1Interview Tip: Give an example of spurious correlation (latency-revenue via traffic confounder).
2Interview Tip: Explain transfer function drift—early-stage products show higher correlation than mature ones.
← Back to Business Metrics Correlation Overview