Offline and Online Storage: Architecture and Trade-offs

Dual Storage Architecture: Feature stores maintain two storage systems optimized for different access patterns. The offline store holds historical values for training (bulk reads, time-range queries). The online store holds current values for inference (point lookups, sub-10ms latency).
Offline Store Design
Optimized for analytical workloads: scanning billions of rows to build training datasets. Typical choices: Parquet files on S3/GCS (cheap, scalable, columnar compression), Delta Lake or Iceberg (add ACID transactions and time travel), or data warehouses (Snowflake, BigQuery for SQL access). Key capability: point-in-time queries. Training data must represent what was known at prediction time, not what we know now. A query like "user_123 features as of 2024-01-15 10:00:00" must exclude any data that arrived after that timestamp.
Online Store Design
Optimized for serving: single-key lookups at scale with minimal latency. Typical choices: Redis (fastest, but expensive for large datasets), DynamoDB/Bigtable (managed, scalable, slightly higher latency), or custom solutions. Key metrics: p99 read latency (target under 5ms), throughput (hundreds of thousands of QPS), and availability (99.99%+ for production ML). Storage is organized by entity key (user_id, item_id) with all features for that entity co-located for single-read retrieval.
Sync and Consistency
Both stores must reflect the same feature values. Two patterns: Batch sync: Periodically compute features from source data and write to both stores. Simple but features can be hours stale. Stream sync: Compute features in real-time and write to online store immediately, then backfill to offline store. Complex but features stay fresh (minutes of lag). Hybrid approach is common: batch for slow-changing features (user demographics), streaming for fast-changing features (session activity).
Cost Reality: Online stores are 10-100x more expensive per GB than offline stores. Only materialize to online store features actually needed for real-time inference. Training-only features stay offline.

💡 Key Takeaways

✓Offline stores optimize for bulk reads and point-in-time correctness

✓Online stores optimize for sub-5ms single-key lookups at high throughput

✓Online storage costs 10-100x more per GB than offline storage

📌 Interview Tips

1Parquet on S3 for offline with Delta Lake for time travel

2Redis for latency-critical features, DynamoDB for scalable managed option

← Back to Feature Store Integration Overview