Production Pipeline: From Data Assembly to Serving at Scale
Data Assembly Pipeline
Multi-horizon models require aligned inputs across time. Build feature tables with: historical targets (observed), known future features (calendar, promotions), static attributes. Ensure temporal alignment—features at time t should only use information available at t. Precompute rolling aggregations (7-day average, 30-day max) and cache in feature store.
Pipeline Structure: Raw data → temporal alignment → feature computation → train/test split (time-based) → model training → forecast generation → serving layer. Each stage runs on schedule; failures trigger alerts and fallbacks.
Training at Scale
For millions of series, global models (one model for all series) are more practical than per-series models. Use static covariates to differentiate series. Distributed training (data parallel) across GPU cluster. Training completes in hours even for large datasets when properly parallelized. Checkpoint frequently to enable recovery from failures.
Batch Forecast Generation
Generate forecasts for all series on a schedule (daily, hourly). Parallelize inference across series. Store forecasts in time-indexed tables: (series_id, forecast_date, horizon, value, lower_bound, upper_bound). Serving layer queries by series and horizon, returning pre-computed values in milliseconds.
Serving Pattern: Pre-compute forecasts during batch and cache. Real-time requests become lookups. For truly real-time needs, deploy model for online inference with latency budget of 50-200ms.
Refresh Strategy
Forecasts become stale as new observations arrive. Refresh frequency depends on horizon granularity: hourly forecasts need hourly refresh, daily forecasts need daily refresh. Balance freshness against compute cost.