Time Travel Storage Patterns for Feature Versioning

Time Travel Storage Concept
Time travel storage enables reconstructing feature state at any historical timestamp by maintaining immutable versioned histories. The core pattern combines base snapshots with append only change logs: each feature update creates a new version keyed by entity ID, event timestamp, and optionally a version counter for deduplication. This mirrors database point in time recovery but adapted for ML feature semantics with per entity timelines.
Copy on Write Architecture
Delta Lake and Apache Iceberg use copy on write semantics where updates create new file versions rather than modifying in place. A transaction log records which files are valid at each version. To read features as of timestamp T, the system resolves the log to find files committed before T, then applies filters within files using per row timestamps. This enables time travel queries like SELECT * FROM features VERSION AS OF timestamp.
Merge on Read Architecture
Apache Hudi optimizes for write heavy workloads using merge on read. Base files store snapshots, delta logs capture changes, and compaction periodically merges deltas. Reads must merge base plus deltas, adding query overhead but reducing write amplification. For high churn features (updated per request), merge on read cuts storage costs 2 to 5x versus copy on write.
Retention and Compaction
Historical versions consume 1.5 to 3x storage of current state. Production systems balance retention windows (7 to 90 days typical) against storage cost. Aggressive compaction reduces file count but limits time travel range. The sweet spot depends on training cadence: if you retrain weekly, 14 day retention suffices; quarterly audits may require 90 day retention.
Netflix Scale
Netflix uses snapshot based time travel on petabyte scale tables to rebuild exact historical training datasets months later for audits and model rollbacks.

💡 Key Takeaways

✓Immutable append only rows keyed by entity ID and event timestamp enable time travel reads, with copy on write formats amplifying storage 1.5 to 3 times versus current state only tables

✓Separate event timestamp (when fact happened) from created timestamp (when system saw it) to track late arrivals and prevent processing time leakage into event time semantics

✓Retention windows balance reproducibility against cost: 7 to 30 days for rapid iteration, 90 plus days for regulated workloads, with high frequency features costing 10 to 100 times more than daily batch

✓Copy on write simplifies as of reads with direct snapshot file scans but increases write cost, while merge on read reduces writes at expense of read time compaction

✓Netflix uses Iceberg style formats at petabyte scale to rebuild multi hundred million row training datasets in hours, supporting months of time travel for audit and rollback

✓Monitor storage growth versus update churn rate to tune compaction schedules: features updated every second require aggressive compaction versus daily batch features

📌 Interview Tips

1Meta unified feature store versions feature values by event time, supporting tens of billions of reads per day with safe backfills that rewrite history offline without corrupting online serving

2Airbnb Zipline feature schema: (entity_id: user_123, feature_name: login_count, event_time: 2024-01-15T10:00:00Z, created_time: 2024-01-15T10:02:30Z, value: 12) stored as immutable row in append only log

3Iceberg table format example: Base snapshot v1 at t0 with 1 billion rows, plus change logs capturing 50 million updates per day, enabling time travel queries over 30 day retention window with 2x storage cost

← Back to Point-in-Time Correctness Overview