Training Serving Skew and Point in Time Feature Correctness
Temporal Leakage: The Most Common Cause
The most common skew source is using future information during training. Example: computing a 7-day CTR feature for an impression at time T by aggregating clicks from T-7d to T+2d because logs arrived late. The model learns to rely on future clicks, inflating offline precision from 0.78 to 0.82. At serving time, those future clicks don't exist, and precision drops to 0.74.
Point-in-Time Correctness
If a user views an item Monday 10am, the training example must use the 7-day CTR computed from prior Monday 10am to current Monday 10am, excluding events after 10am. This applies to all time-windowed aggregates. For target-encoded features (using the target variable to encode categories), compute rates using only data strictly before the impression time.
Late Event Handling
Feature computation must handle late-arriving events consistently. In training, decide on a cutoff: accept events up to 15 minutes late, then freeze. In serving, use the same 15-minute buffer. If training uses all eventually-consistent data but serving uses only immediately-available data, distributions diverge. Write snapshots to the offline store with the same late-event tolerance used in serving.
Default Value Handling
If a feature is missing at serving time (cache miss, new entity), the system substitutes a default (zero or global mean). If training data only includes cases where the feature was present, the model never learns to handle defaults. Fix: inject synthetic missing values during training, or define fallback strategies (item-level → category-level → global prior).