How to Build a Production Metric Suite for Forecast Evaluation
Core Metric Suite
No single metric captures all forecast quality aspects. Production systems need multiple metrics: MAPE or sMAPE for percentage accuracy, RMSE for absolute accuracy with large-error penalty, Bias for systematic directional error, MASE for comparison against naive baseline. Report all four; optimize for the one most aligned with business cost.
MASE (Mean Absolute Scaled Error): MASE compares error to naive forecast error: MASE = MAE / MAE_naive. MASE < 1 means model beats naive; MASE > 1 means model loses to naive. MASE is scale-independent and works with zeros, avoiding MAPE limitations.
Segmented Metrics
Aggregate metrics hide problems. A model with 5% MAPE overall may have 3% on high-volume products and 40% on low-volume products. Segment by: volume tier (high/medium/low), product age (established/new), volatility (stable/variable), business importance. Identify segments where forecast quality is unacceptable.
Horizon-Specific Metrics
Forecast accuracy degrades with horizon. Report metrics at each horizon: 1-day MAPE, 7-day MAPE, 30-day MAPE. Stakeholders consuming different horizons need appropriate expectations. If 7-day forecasts are used for inventory and 30-day for planning, both horizon metrics matter.
Baseline Comparison: Always compare against baselines: naive (last observation), seasonal naive (same period last year), simple moving average. If your complex model only marginally beats naive, the complexity may not be justified.
Business Metrics
Beyond statistical metrics, track business impact: inventory turns, stockout rate, markdown rate, service level achieved. These connect forecast accuracy to business outcomes. A 5% MAPE improvement that reduces stockouts by 20% is more compelling than abstract accuracy gains.