Time Series Forecasting • Model Evaluation (MAPE, RMSE, Forecast Bias)Easy⏱️ ~2 min
What is Root Mean Squared Error (RMSE) in Time Series?
Root Mean Squared Error (RMSE) is the square root of the average squared difference between predictions and actuals. Unlike MAPE, RMSE is measured in the same units as your target variable, whether that's seconds, dollars, or units. The squaring operation means RMSE heavily penalizes large errors: a single 100 unit miss contributes more than ten 10 unit misses.
This quadratic penalty makes RMSE ideal when large errors are disproportionately costly. In ridesharing Estimated Time of Arrival (ETA) forecasting, a 5 minute underestimate might cause a cancellation, while five 1 minute errors are barely noticed. Uber and similar platforms typically target RMSE under 90 seconds for trips under 10 minutes and under 180 seconds for longer trips at median traffic conditions. A handful of trip ETAs with 20 minute errors can raise aggregate RMSE by several seconds across millions of trips.
The major limitation is scale dependence. You cannot compare RMSE across series with different magnitudes without normalization. A forecast for daily revenue in thousands of dollars will have vastly different RMSE than a forecast for daily unit sales. Teams address this by computing Root Mean Squared Scaled Error (RMSSE), dividing by the RMSE of a naive baseline like seasonal naive or last value, making the metric comparable across series.
RMSE has no directional bias, treating over forecasts and under forecasts symmetrically. It's defined for all real valued targets, unlike MAPE which breaks at zero. In practice, when engineers say "we need to reduce large misses," they mean RMSE. When they see RMSE spike, they investigate outliers and data quality issues before concluding the model degraded.
💡 Key Takeaways
•RMSE squares errors before averaging, causing large misses to dominate the metric quadratically: one 100 unit error contributes 100 times more than one 10 unit error
•Measured in same units as target variable (seconds, dollars, units), making it interpretable but scale dependent across different series
•Ideal when large errors are disproportionately costly: Uber targets RMSE under 90 seconds for short trips, 180 seconds for long trips at median traffic
•Sensitive to outliers and data quality: 0.1% of ETAs with 20 minute error can raise aggregate RMSE by several seconds across millions of predictions
•No directional bias unlike MAPE, treats over forecasts and under forecasts symmetrically, and works with all real valued targets including zeros
•Normalize with RMSSE (divide by naive baseline RMSE) to compare across series with different magnitudes, values under 1.0 mean better than baseline
📌 Examples
Ridesharing ETA: System maintains RMSE under 90 seconds for trips under 10 minutes, monitors by city and traffic regime with tailored thresholds per segment
E-commerce capacity planning: RMSE used for warehouse load forecasting where underestimating peak by 1000 orders causes service degradation, while ten 100 order errors are manageable
Outlier impact: Retail forecaster observed RMSE spike from 45 to 78 units after sensor malfunction mislabeled 200 SKU days (0.1% of data) with 10x actual demand