PIT Correctness Failure Modes and Edge Cases
Out of Order and Late Arriving Data
Point in Time (PIT) correctness fails in subtle ways that can go undetected for months while silently degrading model quality. Streaming systems deliver events late with p95 on time but p99 late by minutes to hours. If the system gates by processing time instead of event time, late arrivals overwrite history and contaminate training labels. A user action at 2pm arriving at 4pm appears available at 2pm in naive systems, leaking future information.
Backfill Contamination
Backfilling feature pipelines after bug fixes or schema changes can corrupt historical state. If the backfill uses current logic to recompute past values, features reflect information unavailable at original timestamps. The fix is immutable append only storage where backfills create new versions rather than overwriting, preserving the original buggy values alongside corrected ones for comparison.
Clock Skew
Distributed systems with unsynchronized clocks introduce systematic bias. If producer clocks run ahead, events appear available before they actually occurred. NTP synchronization to millisecond precision across all infrastructure is table stakes. Log both event time (from producer) and ingestion time (from consumer) to detect and correct skew during joins.
Window Boundary Races
Aggregates over time windows (sum over last 7 days) face boundary conditions. An event at exactly midnight on day boundary might be included or excluded depending on whether inequality is less than or less than or equal to. Inconsistent boundary handling between training and serving causes subtle feature drift.
Timezone Bugs
Features aggregated by calendar day must handle timezone consistently. A global user active across UTC day boundaries may have activity counted twice or missed entirely if training and serving use different timezone assumptions.