Model Monitoring & ObservabilityData Drift DetectionMedium⏱️ ~3 min

Baseline Selection and Windowing Strategy

BASELINE SELECTION

What distribution should current data be compared against? The choice fundamentally affects what you detect.

Training data baseline: Compare production to training distribution. Detects when production diverges from what the model learned. Problem: training data may be old, and some divergence is expected as the world changes.

Recent production baseline: Compare current window to recent past (e.g., last 30 days). Detects sudden changes. Does not detect gradual drift that happens slowly over months.

Rolling baseline: Continuously update baseline as new data arrives. Adapts to expected change but may miss sustained drift that happens gradually.

WINDOWING STRATEGIES

Fixed windows: Compare daily, weekly, or monthly aggregates. Simple to implement. Window size affects sensitivity: smaller windows detect faster but have more noise; larger windows are more stable but slower to detect.

Sliding windows: Continuously compare last N hours/days to baseline. More responsive than fixed windows. Requires more compute as you recalculate continuously.

Exponentially weighted: Recent samples weighted more heavily. Balances responsiveness and stability. Decay parameter controls the tradeoff.

PRACTICAL GUIDELINES

For high-velocity domains (real-time bidding, fraud), use hourly windows with rolling baselines. For stable domains (document classification), use weekly windows with training baselines.

Sample size matters: statistical tests require sufficient samples. If you have only 100 samples per window, you cannot reliably detect small drifts. Minimum 1000 samples per window is a common rule of thumb.

Multi-resolution monitoring: track drift at multiple window sizes simultaneously. Hourly windows catch sudden changes; monthly windows catch gradual drift.

⚠️ Key Trade-off: Shorter windows detect faster but produce more false alarms. Longer windows are more stable but miss rapid changes. Use multiple window sizes for defense in depth.
💡 Key Takeaways
Baseline options: training data (detects divergence from learned), recent production (detects sudden changes), rolling (adapts to expected change)
Window size tradeoff: smaller = faster detection + more noise; larger = stable + slower; use multiple resolutions
Minimum 1000 samples per window for reliable statistical tests; high-velocity domains need hourly windows
📌 Interview Tips
1Interview Tip: Explain baseline selection tradeoffs—training baseline vs rolling baseline.
2Interview Tip: Describe multi-resolution monitoring: hourly for sudden changes, monthly for gradual drift.
← Back to Data Drift Detection Overview
Baseline Selection and Windowing Strategy | Data Drift Detection - System Overflow