Time Series Forecasting • Statistical Models (ARIMA, Exponential Smoothing)Hard⏱️ ~3 min
Production Architecture: Batch Training and Online Serving
Production forecasting systems using ETS and ARIMA follow a consistent architecture pattern across companies: streaming or batch aggregation feeds a modeling tier that trains per series models, which are then served through online and batch forecast APIs.
The data layer ingests raw events such as orders, trips, or video views and aggregates them to a forecast grain: 5 minute, hourly, or daily buckets. Timezone normalization is critical because daylight saving time creates 23 or 25 hour days that break daily and weekly seasonality without correction. Calendar awareness handles holidays and special events. Partitioning by series key (item by location, market by product) and time enables parallel processing. A slowly changing dimension table maintains hierarchies like item to category to department for reconciliation.
The modeling layer trains one model per series key. For 20 million item location series, sharding across 2,000 workers with per series fit time of 250 milliseconds for ETS or 500 milliseconds for SARIMA yields 90 to 120 minute batch completion times at 75 percent CPU utilization. That translates to roughly 16,000 fits per second aggregate throughput. Each worker caches series data in memory to minimize IO. Model state (ETS component vectors or ARIMA coefficients plus recent lags and residuals) is stored per key for incremental updates.
Online serving implements a stateful service consuming aggregated ticks. ETS updates run in constant O(1) time: 1 to 5 milliseconds per series is typical, supporting thousands of concurrent keys within a 50 millisecond service budget. ARIMA updates take longer due to O(p plus q) recursion but still complete in 10 to 20 milliseconds for typical orders. Outlier clamps apply before state updates to prevent spikes from corrupting models. For missing data periods, the service propagates forecasts forward and decays uncertainty.
Forecast APIs return point predictions plus quantiles such as P10, P50, P90. ETS uses state space variance recursion. ARIMA uses forecast variance from the error model. Intervals are calibrated using rolling backtests to correct systematic undercoverage. Target API latencies are p99 under 100 milliseconds for 1,000 concurrent series requests. Downstream consumers include autoscaling systems, replenishment planners, pricing engines, and finance dashboards that refresh hourly or daily.
Hierarchical reconciliation enforces additivity constraints: independent per series forecasts do not naturally sum to totals, causing drift in planning systems. MinT or proportional scaling reconciles bottom up and top down views before publishing to inventory and finance systems. Cold start for new series uses pooled parameters from category level models or defaults to seasonal naive when there is no history.
💡 Key Takeaways
•Scale example: 20 million item location series trained in 90 minutes on 2,000 cores at 250 milliseconds per ETS fit, yielding 16,000 fits per second throughput
•Online ETS updates complete in 1 to 5 milliseconds per series with O(1) constant time recursion, supporting thousands of concurrent keys under 50 millisecond budget
•Timezone normalization critical: daylight saving time creates 23 or 25 hour days that break seasonality models without calendar correction
•Hierarchical reconciliation enforces additivity: independent forecasts do not sum correctly, causing planning drift without MinT or proportional scaling
•Cold start strategy: new series with under two seasonal cycles share pooled parameters from category models or default to seasonal naive baseline
•API targets: p99 forecast latency under 100 milliseconds for 1,000 concurrent series, returning P10, P50, P90 quantiles calibrated via rolling backtests
📌 Examples
Amazon retailer: 20 million item location series, 90 minute batch runs, hierarchical reconciliation from item to store to region to country levels
Uber zone demand: stateful service consumes minute aggregates, ETS updates in under 5 milliseconds, feeds surge pricing control loop refreshing every 1 to 5 minutes
Netflix CDN planning: batch forecasts with 1 to 7 day horizons for viewing load per city, target MAPE under 10 percent, hourly dashboard refresh