Feature Engineering & Feature Stores • Point-in-Time CorrectnessHard⏱️ ~3 min
Time Travel Storage Patterns for Feature Versioning
Time travel storage enables reconstructing feature state at any historical timestamp by maintaining immutable versioned histories. The core pattern combines base snapshots with append only change logs: each feature update creates a new version keyed by entity ID, event timestamp, and optionally a version counter for deduplication. This mirrors database point in time recovery but adapted for machine learning feature semantics with per entity timelines.
Two dominant architectures exist. Copy on write formats create new immutable files on each update, simplifying as of reads by directly scanning the relevant snapshot files but causing 1.5 to 3 times storage amplification. Merge on read formats append changes to logs and compact lazily, reducing write cost but increasing read time complexity. Netflix and Meta use Iceberg or Delta Lake style formats achieving petabyte scale with configurable retention windows of 7 to 90 days, balancing reproducibility needs against storage costs.
The data model stores per entity feature values as append only rows with event time as the version dimension. Updates are additional rows with higher event time, not in place mutations. For aggregates like rolling 7 day counts, teams precompute windows with window end at event time, or store raw events and compute as of features during joins for maximum flexibility. Airbnb Zipline maintains both event timestamp (when fact occurred) and created timestamp (when system ingested) to track late arrivals explicitly.
Retention policies govern how long history is kept. Rapid iteration teams use 7 to 30 days to enable backfills and A/B test analysis. Regulated workloads like finance or healthcare require 90 plus days or indefinite retention for audit trails. Storage growth scales with update churn rate: high frequency features (updated every second) cost 10 to 100 times more to version than daily batch features. Monitor storage versus update frequency to tune compaction schedules and retention.
💡 Key Takeaways
•Immutable append only rows keyed by entity ID and event timestamp enable time travel reads, with copy on write formats amplifying storage 1.5 to 3 times versus current state only tables
•Separate event timestamp (when fact happened) from created timestamp (when system saw it) to track late arrivals and prevent processing time leakage into event time semantics
•Retention windows balance reproducibility against cost: 7 to 30 days for rapid iteration, 90 plus days for regulated workloads, with high frequency features costing 10 to 100 times more than daily batch
•Copy on write simplifies as of reads with direct snapshot file scans but increases write cost, while merge on read reduces writes at expense of read time compaction
•Netflix uses Iceberg style formats at petabyte scale to rebuild multi hundred million row training datasets in hours, supporting months of time travel for audit and rollback
•Monitor storage growth versus update churn rate to tune compaction schedules: features updated every second require aggressive compaction versus daily batch features
📌 Examples
Meta unified feature store versions feature values by event time, supporting tens of billions of reads per day with safe backfills that rewrite history offline without corrupting online serving
Airbnb Zipline feature schema: (entity_id: user_123, feature_name: login_count, event_time: 2024-01-15T10:00:00Z, created_time: 2024-01-15T10:02:30Z, value: 12) stored as immutable row in append only log
Iceberg table format example: Base snapshot v1 at t0 with 1 billion rows, plus change logs capturing 50 million updates per day, enabling time travel queries over 30 day retention window with 2x storage cost