Time Series Forecasting • Feature Engineering (Lag Features, Rolling Stats, Seasonality)Easy⏱️ ~2 min
What are Lag Features in Time Series?
Lag features are historical values of your target variable or other signals at fixed time offsets from the prediction point. If you're predicting sales on day t, a lag 1 feature is sales from day t minus 1, lag 7 is sales from day t minus 7, and lag 28 is sales from day t minus 28. These features capture autocorrelation, the tendency of a time series to correlate with its own past values.
The power of lag features comes from their simplicity and interpretability. A retail demand model might use three lags: yesterday's sales (captures immediate trends), last week same day (captures weekly seasonality), and four weeks ago same day (captures monthly patterns). Amazon's demand forecasting uses similar patterns across millions of products, selecting lags that align with replenishment cycles and promotional calendars.
Choosing which lags to include requires domain knowledge and statistical testing. You don't enumerate every possible offset because that inflates feature count, increases storage, and adds noise. Instead, use autocorrelation plots to identify significant lags. For daily retail data, lags at {1, 7, 14, 28} often capture the right signal. For hourly ride demand at Uber, lags at {1 hour, 24 hours, 168 hours (one week)} capture recent trends, daily patterns, and weekly cycles without creating hundreds of redundant features.
The critical constraint is point in time correctness. Every lag feature must be available at prediction time. If you're forecasting sales for tomorrow morning at 6 AM, you can use sales from yesterday, but not from later today. This causal ordering prevents leakage where future information accidentally improves training accuracy but fails in production.
💡 Key Takeaways
•Lag features are past values at fixed offsets like t minus 1, t minus 7, or t minus 28, capturing autocorrelation and delayed effects without complex models
•Typical production systems use 3 to 5 carefully selected lags rather than enumerating all offsets, keeping feature count under 10 to control latency and storage
•Amazon and Uber align lag selection with business cycles: daily for immediate trends, weekly for same day patterns, monthly for seasonal inventory or promotions
•Point in time correctness is mandatory. Every lag value must be observable at prediction time, preventing leakage from using future data during training that won't exist in production
•Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots identify which lags carry signal, avoiding redundant features that increase model complexity without predictive gain
📌 Examples
Retail demand forecasting for 50,000 products uses lag 1 (yesterday sales), lag 7 (same day last week), lag 28 (same day 4 weeks ago) to capture trends and weekly seasonality with just 3 features per product
Uber ETA model uses lag 1 hour and lag 24 hour travel times on the same road segment to detect short term congestion and daily traffic patterns, updating within 30 seconds as new trip data arrives
Netflix streaming quality predictor uses lag 5 minute and lag 60 minute network throughput to forecast bandwidth, keeping features under 10ms retrieval latency for real time adaptive bitrate decisions