Feature Engineering & Feature Stores • Online vs Offline FeaturesEasy⏱️ ~2 min
Online vs Offline Features: Core Distinction
Online features are computed and served for real time inference with strict latency requirements, typically stored in low latency key value stores like Redis or DynamoDB. They must be retrieved per entity (user_id, item_id) within single digit milliseconds to sit on the critical path of user facing requests. For example, Uber retrieves features like "trips in last 5 minutes" with p99 latency under 10ms to power real time ETA predictions and driver matching.
Offline features are computed in batch from historical data for training, validation, and large scale batch scoring. They live in data warehouses or data lakes and support time travel (point in time correct snapshots), heavy joins, and backfills across terabyte to petabyte scale datasets. Airbnb's Zipline processes billions of rows with automated backfills to months of history for search ranking and pricing models.
The fundamental system design challenge is unifying feature definitions across both contexts. One logical feature specification must generate streaming or batch pipelines to compute the feature, an offline table with point in time correctness for training, and an online table for low latency serving. Without this unification, training serving skew emerges where offline evaluation metrics (AUC, precision) fail to translate to online performance because the feature logic diverges between environments.
Production systems balance five core guarantees: freshness Service Level Objectives (SLOs) measuring how current the data is, tail latency SLOs (p95/p99) for request budgets, consistency between stores to prevent distribution mismatches, point in time correctness to avoid data leakage, and backfillability for reproducible training datasets.
💡 Key Takeaways
•Online features target p99 latency under 10ms for real time serving, while offline features prioritize throughput over latency and can take hours to compute across petabyte scale datasets
•DoorDash serves online features at 10,000+ queries per second (QPS) with p99 latency in low single digit milliseconds using in memory key value stores
•Freshness SLOs differ dramatically: streaming features achieve seconds to minutes staleness (sub second for critical fraud counters at Meta), while offline batch features update daily or hourly
•Netflix budgets 5 to 15ms p99 for feature fetches within 100 to 300ms total page render time, requiring aggressive caching and feature bundling
•Point in time correctness is mandatory for offline training to prevent label leakage, using event time semantics where features joined at label timestamp T only include data with event time <= T
•Unified feature definitions prevent training serving skew by generating both batch pipelines for offline tables and streaming pipelines for online stores from a single specification
📌 Examples
Uber Michelangelo: Streaming aggregation computes "trips in last 5 minutes" with sub minute freshness, materialized to both data lake for training and Redis for serving with p99 under 10ms
Airbnb Zipline: Feature registry with DSL generates point in time correct offline tables with billions of rows and publishes subset to online store with single digit to low tens of milliseconds p99 reads
LinkedIn Venice: Serves per member model features and embeddings with sub 10ms p99 at millions of aggregate QPS across multi region replication with petabyte scale data