Model Monitoring & ObservabilityConcept Drift & Model DecayMedium⏱️ ~2 min

Label Delay and Two Stage Learning

Label latency is a fundamental constraint in production machine learning systems. Clicks arrive within minutes, purchases within hours, but chargebacks take weeks and true fraud confirmation can take months. This delay forces teams to use proxy labels for fast learning, then reconcile with delayed ground truth when it arrives. The pattern is two stage learning: train quickly on proxy labels to adapt to drift, then schedule delayed correction jobs that recalibrate with true labels. At Google and Meta ads platforms, click labels are available within minutes enabling multiple retraining runs per day. But conversion labels (purchases, sign ups) arrive 24 to 72 hours later. Systems run delayed feedback correction jobs nightly that recalibrate probability estimates and adjust model parameters. Stripe and PayPal fraud systems face even longer delays: chargebacks arrive weeks after transactions. They use proxy labels like negative file hits (known bad accounts), velocity rule triggers, or early user reports within 1 to 24 hours for fast iteration. True fraud labels trigger recalibration weeks later. The critical implementation detail is keeping separate metrics and dashboards for proxy versus truth. Proxy label models optimize for fast reaction but introduce bias. For example, using early user reports as fraud labels overweights vocal users and misses silent fraud. Delayed correction with true labels improves calibration but can thrash if applied too aggressively. Best practice is to cap daily calibration deltas (for example, limit probability shifts to 5% per day) and use inverse propensity weighting or calibration layers to correct for selection effects when the policy changed traffic distribution between training and correction.
💡 Key Takeaways
Label arrival times vary dramatically: clicks in minutes, purchases in hours, chargebacks in weeks, fraud confirmation in months. This latency determines how fast you can detect and correct drift.
Two stage learning: Stage one trains on proxy labels for fast adaptation (hourly to daily updates). Stage two runs delayed correction jobs with true labels (nightly to weekly) to fix calibration and remove bias.
Proxy label bias is real: Early user reports overweight vocal users, negative file hits miss novel fraud, click through rate (CTR) proxies for conversion miss quality. Always measure and dashboard this bias separately.
Cap daily calibration shifts: Limit probability changes to 5% per day to prevent thrashing. During incidents or sudden drift, aggressive corrections can make things worse by overreacting to noisy delayed labels.
Use inverse propensity weighting: When policy changed traffic between training and correction (for example, model downranked some items), apply propensity scores to debias delayed feedback. Otherwise you underestimate performance of downranked items.
Separate metrics dashboards: Track proxy label metrics (react fast) and true label metrics (measure real impact) distinctly. Alert on divergence between the two as a signal of proxy bias or distribution shift.
📌 Examples
Google and Meta ads: Click labels within minutes enable 3 to 6 retraining runs per day. Conversion labels (purchases) arrive 24 to 72 hours later. Nightly delayed feedback correction jobs recalibrate probability estimates. Dashboards show both CTR (fast proxy) and conversion rate (delayed truth).
Stripe fraud detection: Proxy labels from negative file hits and velocity rules arrive within 1 to 24 hours, enabling hourly retraining during attacks. Chargeback labels (true fraud) arrive weeks later. Weekly correction jobs recalibrate with cap of 5% daily probability shift to prevent model instability.
Netflix recommendation: Immediate watch start is proxy for satisfaction. Completion rate and thumbs up (delayed by hours to days) are better signals. Two stage training uses watch start for fast candidate generation updates, delayed engagement signals for nightly re-ranking model calibration.
← Back to Concept Drift & Model Decay Overview
Label Delay and Two Stage Learning | Concept Drift & Model Decay - System Overflow