Online vs Offline Features: Core Distinction
Online Features
Must be retrieved per entity (user_id, item_id) within single digit milliseconds to sit on the critical path of user facing requests. For example, Uber retrieves features like "trips in last 5 minutes" with p99 latency under 10ms to power real time ETA predictions and driver matching. Stored in Redis, DynamoDB, or similar low latency stores.
Offline Features
Live in data warehouses or data lakes and support time travel (point in time correct snapshots), heavy joins, and backfills across terabyte to petabyte scale datasets. Airbnb's Zipline processes billions of rows with automated backfills to months of history for search ranking and pricing models.
The Unification Challenge
One logical feature specification must generate: streaming or batch pipelines to compute the feature, an offline table with point in time correctness for training, and an online table for low latency serving. Without this unification, training serving skew emerges where offline evaluation metrics fail to translate to online performance because the feature logic diverges.
Core Guarantees
Production systems balance five guarantees: freshness SLOs (how current the data is), tail latency SLOs (p95/p99) for request budgets, consistency between stores to prevent distribution mismatches, point in time correctness to avoid data leakage, and backfillability for reproducible training datasets.