Label Delay and Feedback Windows in Production Monitoring
THE LABEL DELAY PROBLEM
Performance monitoring requires ground truth. You need to know what the correct answer was to measure whether predictions were right. But labels arrive with delay: clicks happen quickly, conversions take days, fraud confirmation takes weeks.
Click/engagement: 0-seconds to minutes. Fast feedback available.
Conversion/purchase: Hours to days. Most users who will convert do so within 7 days.
Fraud confirmation: 30-90 days. Investigations take time.
Churn: 30-180 days. You only know someone churned after they leave.
During label delay, you cannot measure true performance. A model deployed today might be failing, but you will not know for 30 days.
FEEDBACK WINDOW DESIGN
Choose appropriate windows for different metrics. For fraud, define a 30-day attribution window: if no fraud reported within 30 days, consider the prediction correct. This introduces some error but enables monitoring.
Early feedback windows: Monitor faster-arriving proxy metrics (clicks instead of conversions) for early signal. Proxy-actual correlation may weaken over time—track this drift.
Partial labels: Use available labels to update monitoring even if incomplete. If 70% of labels have arrived after 7 days, compute metrics on that 70%. Update as more labels arrive.
LEADING VS LAGGING INDICATORS
Leading indicators: Data drift, prediction distribution shift, feature quality degradation. These can be measured immediately and often precede performance drops.
Lagging indicators: Accuracy, precision, recall on labeled data. Definitive but delayed.
Use leading indicators for early warning, lagging indicators for confirmation. If leading indicators suggest problems but lagging metrics are stable when they arrive, investigate—you may have caught something early.