Learn→Time Series Forecasting→Feature Engineering (Lag Features, Rolling Stats, Seasonality)→1 of 6

Time Series Forecasting • Feature Engineering (Lag Features, Rolling Stats, Seasonality)Easy⏱️ ~2 min

What are Lag Features in Time Series?

Definition: Lag features are past values of the target variable used as inputs to predict future values. Lag-1 is yesterday's value, lag-7 is last week's value, lag-365 is last year's value. They capture autoregressive patterns—the tendency for time series to correlate with their own past values.
Why Lags Matter
Many time series exhibit momentum: if sales were high yesterday, they tend to be high today. Lag features encode this directly. For daily sales forecasting, lag-1 captures day-to-day momentum, lag-7 captures weekly patterns (same day last week), lag-365 captures yearly seasonality (same day last year). Different lags capture different temporal patterns.
Selecting Lag Orders
Use autocorrelation function (ACF) to identify significant lags. Spikes at lag 7 and 14 indicate weekly seasonality. Spikes at lag 365 indicate yearly patterns. Include lags where ACF exceeds significance threshold (typically 2/√n). Too few lags miss patterns; too many cause overfitting and multicollinearity.
Practical Guidance: Start with domain-relevant lags: lag-1 (yesterday), lag-7 (weekly), lag-28 or lag-30 (monthly), lag-365 (yearly). Add intermediate lags if ACF shows significance. Remove lags with near-zero feature importance after initial training.
Horizon Constraints
Lag features must be available at prediction time. For 7-day ahead forecasts, lag-1 through lag-6 are unavailable—you do not know tomorrow's value when predicting next week. Use only lags >= forecast horizon, or use predicted values (recursive forecasting) for unavailable lags. This constraint is critical and commonly violated.
Multi-Series Considerations
For panel data (multiple related series), include both own-series lags and cross-series lags. A product's sales might correlate with competitor product sales from last week. Cross-lags capture spillover effects invisible to single-series analysis.

💡 Key Takeaways

✓Lag features encode autoregressive patterns: lag-1 for momentum, lag-7 for weekly, lag-365 for yearly seasonality

✓Use ACF to identify significant lags; include where ACF exceeds 2/√n significance threshold

✓For H-step forecasts, only lags >= H are available—using shorter lags causes data leakage

📌 Interview Tips

1Start with domain-relevant lags: 1 (yesterday), 7 (weekly), 28-30 (monthly), 365 (yearly)

2Cross-series lags capture spillover effects: product sales may correlate with competitor sales from last week

← Back to Feature Engineering (Lag Features, Rolling Stats, Seasonality) Overview