Learn→Time Series Forecasting→Scalability (Millions of Time Series, Hierarchical Forecasting)→1 of 6
Time Series Forecasting • Scalability (Millions of Time Series, Hierarchical Forecasting)Medium⏱️ ~3 min
Hierarchical Forecasting: Predicting Across Millions of Related Time Series
Hierarchical forecasting tackles the challenge of predicting across tree or graph structures where time series aggregate naturally. A retailer might track 10 million Stock Keeping Unit (SKU) by store combinations at the leaf level, which roll up to store totals, then state totals, then national totals. Similarly, Uber forecasts ride demand at thousands of individual geohash zones that aggregate to districts and cities. The goal is not just accuracy at each level, but also coherence: parent totals must equal the sum of their children.
The core challenge is reconciliation. When you forecast each level independently, the sum of store forecasts rarely matches the national forecast you computed separately. Reconciliation projects these unconstrained forecasts onto a subspace defined by aggregation constraints, mathematically ensuring that all totals align. This step often reduces error at upper levels because it pools information across the hierarchy.
At scale, the computational pattern looks like this: Amazon runs forecasting pipelines that generate hundreds of millions of SKU location horizon predictions daily. With a 28 day forecast horizon and 10 million leaf series, you produce 280 million prediction rows. Using a gradient boosted tree model that scores at 10 microseconds per row on modern hardware, a single 64 virtual Central Processing Unit (vCPU) node can process roughly 6.4 million rows per second. This means 280 million rows complete in about 45 seconds on one node, or under 10 minutes when fanned out across 10 nodes.
The trade off between accuracy and compute dominates design decisions. Bottom up forecasting, where you predict only leaves and sum upward, preserves granular signals but is expensive when leaves are noisy. Top down forecasting predicts the national total and allocates downward using historical proportions, which is fast and coherent by construction but can miss local shifts in mix. Middle out forecasts at an intermediate level and reconciles both directions. Optimal reconciliation forecasts all levels and solves a weighted linear system, often yielding the best accuracy but requiring covariance estimation and matrix operations that can take hours at million scale without approximation.
💡 Key Takeaways
•Hierarchical forecasting predicts across tree structures where children aggregate to parents, common in retail (SKU to store to state to national) and marketplace applications (zone to city to region)
•Reconciliation ensures coherence by projecting unconstrained forecasts onto aggregation constraints, mathematically forcing parent totals to equal sum of children, often reducing upper level error by pooling information
•At Amazon scale, generating 280 million predictions (10 million leaves times 28 day horizon) takes under 10 minutes on a 10 node cluster, with gradient boosted models scoring at 10 microseconds per row
•Bottom up preserves granular signal but is expensive for noisy leaves, top down is fast and coherent but misses local mix shifts, optimal reconciliation gives best accuracy but requires hours at scale without approximation
•Production systems use global models that share parameters across millions of series rather than local per series models, amortizing training cost and enabling cold start through learned embeddings
📌 Examples
Walmart M5 competition: 42,840 hierarchical forecasts across store, state, and product categories with 28 day horizon, winning solutions used gradient boosted trees plus optimal reconciliation
Uber demand forecasting: Predict ride demand at thousands of geohash zones, reconcile by city to keep each linear solve under a few thousand nodes, finishing reconciliation in seconds per city within one hour Service Level Agreement (SLA)
Amazon scale: Hundreds of millions of SKU location horizon predictions daily, using global LightGBM models for base forecasts then MinT style reconciliation with diagonal covariance approximation