Time Series Forecasting • Statistical Models (ARIMA, Exponential Smoothing)Hard⏱️ ~3 min
Failure Modes: Structural Breaks, Promotions, and Data Quality
Production forecasting systems face systematic failure modes that require architectural mitigations. Structural breaks from policy changes, outages, or pandemics introduce regime shifts that violate model assumptions. ETS will lag because smoothing updates slowly: a 50 percent demand drop takes weeks to fully absorb with typical alpha values of 0.1 to 0.3. ARIMA with fixed differencing persists the wrong memory structure, producing biased forecasts until retrained. Mitigation includes automatic break detection using CUSUM or PELT algorithms and hard resets of model state when level shifts exceed k sigma of residual variance, typically k equals 3.
Promotions and holidays create short spikes that violate linear error assumptions. ETS with multiplicative error can numerically explode if a spike is extreme, and ARIMA residuals become non Gaussian, causing prediction intervals to undercover by 10 to 20 percentage points. Best practice uses event calendars or intervention variables: model holidays as additive impulses or level shifts rather than letting the base model absorb them. Temporarily widening intervals by 50 to 100 percent during known events maintains coverage.
Seasonality misspecification produces persistent error. If the true seasonal period is 168 hours for weekly patterns in hourly data but the model uses 24 hours for daily patterns, residuals show strong autocorrelation at lag 168. Minute level series with both daily and weekly cycles require multiple seasonalities. Plain SARIMA or Holt Winters underfit, causing 15 to 30 percent MAPE degradation. Use multi seasonal ETS like BATS or TBATS, or ARIMA with Fourier terms. Monitor residual seasonal AutoCorrelation Function (ACF) to detect misspecification.
Intermittent demand and zeros are common in retail long tail. Standard ETS and ARIMA oscillate or forecast negative values. Apply Croston or Syntetos Boylan Approximation (SBA) methods that model occurrence and size separately. Alternatively, use nonnegative constraints or transform to log scale, though log requires an offset for zeros and bias correction on retransformation using smearing or Taylor expansion methods.
Time alignment issues silently corrupt models. Daylight saving time boundaries cause 23 or 25 hour days. Without correction, daily aggregates miscount exposure and weekly seasonality drifts by one hour twice yearly. Timezone mismatches between event generation and aggregation create artificial 1 hour level shifts. Always normalize to UTC or a stable local time and adjust seasonal indices at DST boundaries. Netflix and Uber systems explicitly handle DST in their aggregation pipelines to prevent seasonal model drift.
Data quality failures propagate into forecasts. Backfills introduce sudden history revisions. Missing buckets look like zeros. Duplicated events double counts. ETS absorbs errors into level state. ARIMA misestimates differencing. Implement anomaly filters using Median Absolute Deviation (MAD) thresholds before updates. Flag and skip outliers beyond 5 MAD. Refit from scratch after large backfills exceeding 10 percent of training history. Version model state and keep last known good snapshots to enable rollback within 5 minutes during incidents.
💡 Key Takeaways
•Structural breaks: pandemic or policy shifts cause ETS to lag weeks with alpha 0.1 to 0.3, use CUSUM detection and hard state resets at 3 sigma level shifts
•Promotion spikes: multiplicative ETS errors explode, ARIMA intervals undercover by 10 to 20 percent, mitigate with event calendars and 50 to 100 percent wider intervals
•Seasonality misspecification: wrong period causes 15 to 30 percent MAPE degradation, monitor residual ACF at seasonal lags, use BATS or Fourier terms for multiple cycles
•Daylight saving time: creates 23 or 25 hour days, weekly seasonality drifts 1 hour twice yearly without UTC normalization and seasonal index adjustment
•Intermittent demand: standard models produce negative forecasts or oscillate, apply Croston or SBA methods modeling occurrence and size separately
•Data backfills: over 10 percent history revision corrupts model state, requires full refit, version state for 5 minute rollback during incidents using MAD thresholds
📌 Examples
Netflix CDN load: DST boundaries explicitly handled in aggregation pipeline to prevent 1 hour seasonal drift in hourly viewing forecasts
Amazon promotion planning: holiday event calendar with additive impulse intervention prevents base model from absorbing 3x spikes and exploding multiplicative errors
Uber demand surge: CUSUM break detection triggers automatic state reset when policy changes cause 40 percent zone demand shifts, restoring accuracy within 24 hours