ML Infrastructure & MLOps • Feature Store IntegrationMedium⏱️ ~2 min
Offline and Online Storage: Architecture and Trade-offs
The offline store holds complete feature history for training, backfills, and analytics. It is optimized for wide columnar scans and joins across billions of rows. Systems partition by date and entity hash to accelerate point in time joins. A typical training flow assembles a dataset from a 90 day window with 200 million examples, scanning multiple terabytes. Well tuned pipelines on moderate clusters complete these joins in a few hours. The offline store captures feature versions and metadata fingerprints to enable exact reproducibility: you can regenerate the same training dataset months later.
The online store holds only the latest or near latest values for each entity key. It is built on replicated key value storage tuned for p99 latencies under 20 ms. At 100 thousand QPS with 2 KB feature vectors per request, the system must handle 200 MB per second read throughput per region. To meet this, implementations use memory first databases, aggressive in process caching for ultra hot keys (95 percent hit rate), and parallelized multi feature fetches. Netflix keeps p99 reads under tens of milliseconds at millions of QPS using regional caches and local in service storage.
The cost trade-off is significant. Dual storage doubles your footprint: you pay for years of historical data in a data warehouse or lake, plus replicated online copies to meet tail latency guarantees. Embeddings and rarely used features are expensive to keep hot. Some teams only materialize features with high online call rates, computing long tail features on demand or serving cached fallbacks. The benefit is consistent semantics and reuse across teams, eliminating duplicate pipelines.
💡 Key Takeaways
•Offline storage handles point in time joins over tens of billions of rows in a few hours on moderate clusters, partitioned by date and entity hash
•Online storage must deliver sub 20ms p99 latency at 100 thousand QPS, requiring 200 MB per second throughput and memory first key value stores
•Netflix achieves tens of milliseconds p99 at millions of QPS using regional caches, parallelized reads, and in process caching with 95 percent hit rates
•Dual storage costs double the footprint: years of offline history plus replicated online copies for tail latency, prompting selective materialization
•Feature vectors typically include 50 to 200 scalar features and compact embeddings, totaling 1 to 10 KB per entity key
📌 Examples
Airbnb assembles training datasets from 90 day windows with 200 million labeled examples, scanning multiple terabytes with point in time correctness enforced
Uber's Michelangelo uses per region online stores with independent write paths to avoid cross region hops on the critical serving path, keeping p99 under 20ms
A recommendation system might cache the top 10 thousand user feature vectors in process with a 60 second time to live (TTL), achieving 95 percent hit rate and sub 5ms p50 latency