Baseline Selection and Windowing Strategy

BASELINE SELECTION
What distribution should current data be compared against? The choice fundamentally affects what you detect.
Training data baseline: Compare production to training distribution. Detects when production diverges from what the model learned. Problem: training data may be old, and some divergence is expected as the world changes.
Recent production baseline: Compare current window to recent past (e.g., last 30 days). Detects sudden changes. Does not detect gradual drift that happens slowly over months.
Rolling baseline: Continuously update baseline as new data arrives. Adapts to expected change but may miss sustained drift that happens gradually.
WINDOWING STRATEGIES
Fixed windows: Compare daily, weekly, or monthly aggregates. Simple to implement. Window size affects sensitivity: smaller windows detect faster but have more noise; larger windows are more stable but slower to detect.
Sliding windows: Continuously compare last N hours/days to baseline. More responsive than fixed windows. Requires more compute as you recalculate continuously.
Exponentially weighted: Recent samples weighted more heavily. Balances responsiveness and stability. Decay parameter controls the tradeoff.
PRACTICAL GUIDELINES
For high-velocity domains (real-time bidding, fraud), use hourly windows with rolling baselines. For stable domains (document classification), use weekly windows with training baselines.
Sample size matters: statistical tests require sufficient samples. If you have only 100 samples per window, you cannot reliably detect small drifts. Minimum 1000 samples per window is a common rule of thumb.
Multi-resolution monitoring: track drift at multiple window sizes simultaneously. Hourly windows catch sudden changes; monthly windows catch gradual drift.
⚠️ Key Trade-off: Shorter windows detect faster but produce more false alarms. Longer windows are more stable but miss rapid changes. Use multiple window sizes for defense in depth.

💡 Key Takeaways

✓Baseline options: training data (detects divergence from learned), recent production (detects sudden changes), rolling (adapts to expected change)

✓Window size tradeoff: smaller = faster detection + more noise; larger = stable + slower; use multiple resolutions

✓Minimum 1000 samples per window for reliable statistical tests; high-velocity domains need hourly windows

📌 Interview Tips

1Interview Tip: Explain baseline selection tradeoffs—training baseline vs rolling baseline.

2Interview Tip: Describe multi-resolution monitoring: hourly for sudden changes, monthly for gradual drift.

← Back to Data Drift Detection Overview