Definition
Continuous training (CT) transforms ML from a one time deployment into a closed loop control system. It automates retraining and redeployment pipelines that monitor model health, decide when to retrain, validate candidates offline and online, and gradually shift traffic only when metrics improve.
Why Models Decay
The core problem is that production models decay over time: user behavior shifts, new products launch, competitors change tactics, and seasonal patterns evolve. A fraud model trained on pre holiday traffic will miss new attack vectors during Black Friday. A recommendation model trained three months ago cannot surface content that did not exist then.
Two Freshness Dimensions
Continuous training spans two freshness dimensions. Data freshness measures how quickly new events become features (streaming aggregates updated every 5 minutes versus daily batch features). Model freshness measures how quickly new patterns make it into model weights (hourly incremental updates versus weekly full retrains).
Scale Examples
Netflix retrains homepage personalization models nightly on hundreds of millions of member interactions. Uber runs thousands of models for ride matching, ETA prediction, and fraud detection with retraining cadences from hours to days. Meta processes tens of thousands of training jobs daily. The key is balancing freshness (reacting to drift quickly) against stability (avoiding metric noise and operational churn).
✓Data freshness is how quickly new events become features (streaming updates every 5 minutes versus daily batch), while model freshness is how quickly new patterns update weights (hourly incremental versus weekly full retrain)
✓Netflix retrains homepage personalization nightly on hundreds of millions of interactions with inference latency under 30 milliseconds p95, balancing freshness with serving cost
✓Uber runs thousands of models with cadences from hours (fraud during peak events) to days (pricing models), triggering retrains on drift thresholds like Population Stability Index (PSI) exceeding 0.2
✓The core trade off is freshness versus stability: frequent retraining reacts quickly to drift but risks overfitting to short term noise and metric flapping, while slower cadence is more stable but risks stale predictions during regime shifts
✓Typical online inference Service Level Objectives (SLOs) are p95 latency 10 to 30 milliseconds per model stage, with total chains under 50 to 200 milliseconds depending on product requirements