What are Lag Features in Time Series?
Definition: Lag features are past values of the target variable used as inputs to predict future values. Lag-1 is yesterday's value, lag-7 is last week's value, lag-365 is last year's value. They capture autoregressive patterns—the tendency for time series to correlate with their own past values.
Why Lags Matter
Many time series exhibit momentum: if sales were high yesterday, they tend to be high today. Lag features encode this directly. For daily sales forecasting, lag-1 captures day-to-day momentum, lag-7 captures weekly patterns (same day last week), lag-365 captures yearly seasonality (same day last year). Different lags capture different temporal patterns.
Selecting Lag Orders
Use autocorrelation function (ACF) to identify significant lags. Spikes at lag 7 and 14 indicate weekly seasonality. Spikes at lag 365 indicate yearly patterns. Include lags where ACF exceeds significance threshold (typically 2/√n). Too few lags miss patterns; too many cause overfitting and multicollinearity.
Practical Guidance: Start with domain-relevant lags: lag-1 (yesterday), lag-7 (weekly), lag-28 or lag-30 (monthly), lag-365 (yearly). Add intermediate lags if ACF shows significance. Remove lags with near-zero feature importance after initial training.
Horizon Constraints
Lag features must be available at prediction time. For 7-day ahead forecasts, lag-1 through lag-6 are unavailable—you do not know tomorrow's value when predicting next week. Use only lags >= forecast horizon, or use predicted values (recursive forecasting) for unavailable lags. This constraint is critical and commonly violated.
Multi-Series Considerations
For panel data (multiple related series), include both own-series lags and cross-series lags. A product's sales might correlate with competitor product sales from last week. Cross-lags capture spillover effects invisible to single-series analysis.