What Baselines Represent
Baseline selection fundamentally determines what your monitoring system considers normal versus anomalous. The baseline is the reference distribution against which live data is compared. Choosing incorrectly leads to either missed regressions or constant false alarms as natural variation triggers alerts.
Static Baselines
A static baseline uses the final training dataset distribution as the permanent reference point. Every live window is compared against this frozen snapshot from training time. This approach simplifies reasoning and excels at regression detection because any shift from training is immediately visible. However, static baselines struggle with seasonal patterns: holiday traffic, summer versus winter behavior, or Monday versus weekend patterns all trigger drift alerts even when the model handles them correctly.
Dynamic Baselines
Dynamic baselines adapt over time, comparing current windows against recent history (last 7 to 30 days) rather than a fixed training snapshot. This approach filters out seasonal patterns and natural evolution, only alerting on sudden deviations from recent trends. The risk is slow drift: gradual changes that compound over months remain undetected because each comparison window looks similar to the previous one.
Hybrid Approach
Production systems often combine both. Static baselines with wide thresholds catch major regressions. Dynamic baselines with tight thresholds catch sudden anomalies. Periodic comparison of current distributions to original training distributions catches slow drift that dynamic monitoring misses.
Baseline Refresh Strategy
After model retraining, update static baselines to reflect new training data. Version baselines alongside model artifacts to maintain audit trails and enable rollback comparison when investigating production issues.
✓Static baseline uses frozen training distribution; excels at regression detection and compliance validation but generates false positives under seasonality or growth (streaming platform: 5 to 8% time of day variance causes twice daily alerts)
✓Dynamic baseline uses rolling window (typically 7 or 28 days) as reference; reduces false positives by over 70% for seasonal patterns but can chase drift, hiding gradual degradation over weeks
✓Dual baseline strategy compares live traffic to both static (training) and rolling (7 day) baselines; both firing indicates high confidence incident, only static firing requires investigation of evolution versus regression
✓Baseline update cadence: static updated only on retraining (weeks to months), rolling recomputed continuously; marketplace models use 28 day rolling for demand seasonality, 1 day rolling for fraud detection urgency
✓Memory overhead: static baseline is fixed cost (roughly 20 KB per feature per segment), rolling baseline requires maintaining window state (7 days at 1 hour resolution adds 168 snapshots, roughly 3.4 MB per feature if storing full histograms)
✓Schema validation always uses static baselines: categorical vocabulary changes, unit conversions (Fahrenheit to Celsius), or scale shifts (currency 100x) must breach static baseline to trigger data contract incidents
1Netflix streaming service: rolling 7 day baseline by hour of day handles consumer traffic waves (evening peak 2x morning traffic), static baseline catches model version regressions or upstream pipeline changes
2Uber pricing model: dual baseline with static training distribution (6 month old) and 1 day rolling baseline; sudden city-level demand spike (concert, event) breaches rolling only, long term demographic shift breaches both, triggering retrain
3Airbnb Zipline: static training baselines stored with feature definitions, used for deployment validation; 28 day rolling baselines for prediction acceptance rate monitoring to handle holiday and summer travel seasonality