Online vs Offline Features: Core Distinction

Definition
Online features are computed and served for real time inference with strict latency requirements (single digit ms), stored in low latency key value stores. Offline features are computed in batch from historical data for training and batch scoring, living in data warehouses at terabyte to petabyte scale.
Online Features
Must be retrieved per entity (user_id, item_id) within single digit milliseconds to sit on the critical path of user facing requests. For example, Uber retrieves features like "trips in last 5 minutes" with p99 latency under 10ms to power real time ETA predictions and driver matching. Stored in Redis, DynamoDB, or similar low latency stores.
Offline Features
Live in data warehouses or data lakes and support time travel (point in time correct snapshots), heavy joins, and backfills across terabyte to petabyte scale datasets. Airbnb's Zipline processes billions of rows with automated backfills to months of history for search ranking and pricing models.
The Unification Challenge
One logical feature specification must generate: streaming or batch pipelines to compute the feature, an offline table with point in time correctness for training, and an online table for low latency serving. Without this unification, training serving skew emerges where offline evaluation metrics fail to translate to online performance because the feature logic diverges.
Core Guarantees
Production systems balance five guarantees: freshness SLOs (how current the data is), tail latency SLOs (p95/p99) for request budgets, consistency between stores to prevent distribution mismatches, point in time correctness to avoid data leakage, and backfillability for reproducible training datasets.

💡 Key Takeaways

✓Online features target p99 latency under 10ms for real time serving, while offline features prioritize throughput over latency and can take hours to compute across petabyte scale datasets

✓DoorDash serves online features at 10,000+ queries per second (QPS) with p99 latency in low single digit milliseconds using in memory key value stores

✓Freshness SLOs differ dramatically: streaming features achieve seconds to minutes staleness (sub second for critical fraud counters at Meta), while offline batch features update daily or hourly

✓Netflix budgets 5 to 15ms p99 for feature fetches within 100 to 300ms total page render time, requiring aggressive caching and feature bundling

✓Point in time correctness is mandatory for offline training to prevent label leakage, using event time semantics where features joined at label timestamp T only include data with event time <= T

✓Unified feature definitions prevent training serving skew by generating both batch pipelines for offline tables and streaming pipelines for online stores from a single specification

📌 Interview Tips

1Uber Michelangelo: Streaming aggregation computes "trips in last 5 minutes" with sub minute freshness, materialized to both data lake for training and Redis for serving with p99 under 10ms

2Airbnb Zipline: Feature registry with DSL generates point in time correct offline tables with billions of rows and publishes subset to online store with single digit to low tens of milliseconds p99 reads

3LinkedIn Venice: Serves per member model features and embeddings with sub 10ms p99 at millions of aggregate QPS across multi region replication with petabyte scale data

← Back to Online vs Offline Features Overview