Baseline Selection and Windowing Strategy
BASELINE SELECTION
What distribution should current data be compared against? The choice fundamentally affects what you detect.
Training data baseline: Compare production to training distribution. Detects when production diverges from what the model learned. Problem: training data may be old, and some divergence is expected as the world changes.
Recent production baseline: Compare current window to recent past (e.g., last 30 days). Detects sudden changes. Does not detect gradual drift that happens slowly over months.
Rolling baseline: Continuously update baseline as new data arrives. Adapts to expected change but may miss sustained drift that happens gradually.
WINDOWING STRATEGIES
Fixed windows: Compare daily, weekly, or monthly aggregates. Simple to implement. Window size affects sensitivity: smaller windows detect faster but have more noise; larger windows are more stable but slower to detect.
Sliding windows: Continuously compare last N hours/days to baseline. More responsive than fixed windows. Requires more compute as you recalculate continuously.
Exponentially weighted: Recent samples weighted more heavily. Balances responsiveness and stability. Decay parameter controls the tradeoff.
PRACTICAL GUIDELINES
For high-velocity domains (real-time bidding, fraud), use hourly windows with rolling baselines. For stable domains (document classification), use weekly windows with training baselines.
Sample size matters: statistical tests require sufficient samples. If you have only 100 samples per window, you cannot reliably detect small drifts. Minimum 1000 samples per window is a common rule of thumb.
Multi-resolution monitoring: track drift at multiple window sizes simultaneously. Hourly windows catch sudden changes; monthly windows catch gradual drift.