Learn→Feature Engineering & Feature Stores→Feature Freshness & Staleness→2 of 6

Feature Engineering & Feature Stores • Feature Freshness & StalenessMedium⏱️ ~3 min

Event Time Semantics and Point in Time Correctness

Event Time vs Processing Time
Event time is when an event actually occurred in the real world, as opposed to processing time (when your system handled it). All freshness computation and monitoring must use event time to correctly handle late arriving and out of order data. For example, a user click at 2:00:00 PM that arrives in your pipeline at 2:00:45 PM due to network delays has event time of 2:00:00 PM. If you use processing time, you will incorrectly measure freshness and potentially compute wrong aggregates.
Point in Time Correctness
Prevents label leakage during training, which is one of the most insidious bugs in production ML systems. Features must reflect only information available at the training example timestamp, not future data. If you are predicting whether a user will convert and you include features computed after the conversion event, your model will learn to cheat and offline AUC will be inflated by 5 to 20 percent.
Implementation
Join feature tables to training examples using event time less than or equal to example timestamp. Partition feature stores by event date and cluster by entity key to accelerate these temporal lookups. Zipline enforces this through automated point in time joins across billions of training rows.
Clock Skew Dangers
Distributed systems with clock drift across services can cause timestamps to be inconsistent. A feature computed on a server with a clock 30 seconds ahead will appear fresher than it actually is. Mitigation requires NTP synchronization across infrastructure, storing both event time and ingestion time, and computing freshness server side using monotonic clocks.

💡 Key Takeaways

✓Event time versus processing time matters for correctness. A 5 minute sliding window using processing time will miscount events that arrive late, while event time with watermarks handles delays up to a bounded lateness (typically 5 to 15 minutes).

✓Label leakage from incorrect time joins is common and devastating. One company reported 15% Area Under Curve (AUC) drop when deploying a model trained with future feature values that weren't actually available at serving time.

✓Watermarks bound how late data can arrive. Setting a 5 minute watermark means events more than 5 minutes late are dropped or sent to a dead letter queue, preventing unbounded state growth in streaming jobs.

✓Feature metadata enables runtime freshness enforcement. If a feature has ttl of 60 seconds and current age is 90 seconds, the online assembler can log a violation, substitute a default, or include an age feature for the model.

✓Identical transformation logic between training and serving is critical. Defining features once and materializing to both batch (for training) and online stores (for serving) prevents subtle bugs from code drift.

✓Time travel queries or versioned snapshots add storage cost. Maintaining 90 days of point in time queryable features can be 3x to 10x more expensive than keeping only current values, but it's essential for correct retraining.

📌 Interview Tips

1DoorDash maintains event time windows for features like "orders in last 30 minutes for this store." Late arriving orders due to mobile network delays are correctly included if within the 5 minute watermark, ensuring accurate busy state.

2Uber enforces point in time correctness by snapshotting feature values hourly in offline stores. When training an Estimated Time of Arrival (ETA) model, labels from 3 PM on Jan 15 join with features from the 2:55 PM snapshot, never using data after 3 PM.

3A fraud detection team discovered their model had 0.92 offline AUC but only 0.78 online. Root cause was training features included transaction outcomes that occurred hours after the prediction time, leaking future labels into training data.

← Back to Feature Freshness & Staleness Overview