Offline and Online Storage: Architecture and Trade-offs
Dual Storage Architecture: Feature stores maintain two storage systems optimized for different access patterns. The offline store holds historical values for training (bulk reads, time-range queries). The online store holds current values for inference (point lookups, sub-10ms latency).
Offline Store Design
Optimized for analytical workloads: scanning billions of rows to build training datasets. Typical choices: Parquet files on S3/GCS (cheap, scalable, columnar compression), Delta Lake or Iceberg (add ACID transactions and time travel), or data warehouses (Snowflake, BigQuery for SQL access). Key capability: point-in-time queries. Training data must represent what was known at prediction time, not what we know now. A query like "user_123 features as of 2024-01-15 10:00:00" must exclude any data that arrived after that timestamp.
Online Store Design
Optimized for serving: single-key lookups at scale with minimal latency. Typical choices: Redis (fastest, but expensive for large datasets), DynamoDB/Bigtable (managed, scalable, slightly higher latency), or custom solutions. Key metrics: p99 read latency (target under 5ms), throughput (hundreds of thousands of QPS), and availability (99.99%+ for production ML). Storage is organized by entity key (user_id, item_id) with all features for that entity co-located for single-read retrieval.
Sync and Consistency
Both stores must reflect the same feature values. Two patterns: Batch sync: Periodically compute features from source data and write to both stores. Simple but features can be hours stale. Stream sync: Compute features in real-time and write to online store immediately, then backfill to offline store. Complex but features stay fresh (minutes of lag). Hybrid approach is common: batch for slow-changing features (user demographics), streaming for fast-changing features (session activity).
Cost Reality: Online stores are 10-100x more expensive per GB than offline stores. Only materialize to online store features actually needed for real-time inference. Training-only features stay offline.