Point in Time Correctness and Time Travel
What Point in Time Correctness Means
Point in time correctness prevents label leakage by ensuring training examples only see features that existed at the time of the example. When you join a labeled example with entity key and event timestamp to your feature tables, you must retrieve feature values where the feature event timestamp is less than or equal to the example timestamp. This "as of" join is critical: without it, future information leaks into training, inflating offline AUC by 5 to 20 percent while online performance collapses.
Implementation Pattern
Uses windowed aggregations with lookback periods. For a 7 day rolling count feature, you compute the count over the window ending at the example timestamp, not at the current time. Airbnb Zipline enforces this with explicit "as of" semantics in their Spark based backfills, joining hundreds of feature pipelines to billions of examples while maintaining temporal consistency. The offline store typically partitions by event date and clusters by entity to accelerate these scans, turning full table scans into targeted partition reads that reduce compute time by 10 to 50x.
Time Travel
Extends this concept by snapshotting entire feature tables at specific versions. Using copy on write tables like Apache Hudi, you can reproduce the exact training dataset from 3 months ago by querying feature values at that historical commit. Hopsworks built this into their core with ACID upserts on lake tables, enabling model retraining with identical feature cuts and rollback when new features degrade performance. The cost is storage: keeping 6 months of hourly snapshots for a 10 terabyte feature table requires 60 to 240 terabytes depending on change rate.
Failure Mode
Incorrect time filters or timezone bugs cause leakage. Symptoms include offline AUC of 0.95 dropping to online AUC of 0.78. Mitigation requires anti join validation on timestamps, automated checks that no feature event timestamp exceeds the example timestamp, and unifying transformation code paths between offline and online to eliminate skew.