Failure Modes and Edge Cases in Multi-Horizon Systems
Horizon-Specific Bias
Models may learn biases that differ by horizon—systematically underforecasting at 7 days while overforecasting at 30 days. This happens when training loss weights all horizons equally despite different business importance. Solution: weight loss by horizon importance or train separate heads with horizon-specific losses.
Warning: Aggregate metrics (average across horizons) hide horizon-specific failures. Always report metrics per horizon to catch problems before they affect downstream decisions.
Temporal Misalignment
Forecast timestamps must align with decision timestamps. If inventory decisions happen Monday morning but forecasts are generated Sunday night, the effective horizon is shifted. Ensure forecast generation timing matches consumption patterns. Daylight saving time and timezone handling cause subtle misalignments.
Covariate Timing Errors
Using future observed values as inputs creates data leakage that inflates offline metrics but fails in production. Common mistake: including the target series lag that is not yet available at forecast time. Audit feature pipelines to verify all inputs are available at prediction time for all horizons.
Audit Strategy: For each feature, verify: "At the moment I make forecast for horizon H, is this feature value known?" Any uncertainty means potential leakage.
Cold Start Horizons
New series lack sufficient history for long-horizon patterns. A series with 30 days of data cannot learn yearly seasonality. Fall back to simpler models or borrow patterns from similar established series. Gradually enable longer horizons as history accumulates.
Inconsistent Forecasts
When horizons are forecast independently, results may be inconsistent: daily forecasts might not sum to weekly. Reconciliation methods adjust forecasts to ensure coherence across temporal hierarchies.