Learn→Feature Engineering & Feature Stores→Feature Sharing & Discovery→1 of 6

Feature Engineering & Feature Stores • Feature Sharing & DiscoveryMedium⏱️ ~3 min

Feature Sharing & Discovery: The Dual-Plane Architecture

Definition
Feature sharing and discovery enables hundreds of ML models across different teams to access thousands of features consistently without rebuilding them. The architecture uses a dual plane design: an offline plane for training data and a registry that acts as the central nervous system for metadata and discovery.
The Offline Plane
Computes and stores historical feature values in a data lake or warehouse (Hive, Delta Lake, BigQuery). Batch jobs materialize feature tables partitioned by entity and date, supporting point in time joins for training dataset generation. Throughput is the priority: scanning terabytes for a training job should complete in minutes to hours, not days.
The Online Plane
Serves features at inference time with strict latency requirements. Key value stores (Redis, DynamoDB, Cassandra) provide sub 10ms p95 lookups. Streaming jobs continuously update online values from event streams. The online plane trades storage cost for latency: keeping features hot in memory costs 10 to 50x more per GB than offline cold storage.
The Registry
Catalogs every feature with schema, owner, lineage, freshness SLA, and quality metrics. Serves as the single source of truth for discovery, enabling teams to search and evaluate candidate features before integration. Without a registry, teams reinvent features that already exist or use inconsistent definitions.
Synchronization Challenge
The offline and online planes must stay synchronized. A feature definition change must propagate to both planes atomically, or training serving skew emerges. Feature stores enforce this through versioned feature groups that materialize to both stores from the same transformation logic.

💡 Key Takeaways

✓Dual plane architecture separates offline training (TB scale, point in time correct) from online serving (5 to 20ms p95, 10K to 1M QPS) with registry enforcing consistency across both planes

✓Feature registry is not just storage but active governance: ranks by usage and quality, surfaces null rates and drift scores, enforces training serving parity to prevent skew

✓Production systems achieve 30 to 70 percent reuse rates, cutting model onboarding from weeks to days at Netflix, Uber, LinkedIn, and Airbnb

✓Online serving constraint drives architecture: single digit to low tens of milliseconds p95 latency within sub 100ms end to end inference budgets requires pre materialization and aggressive caching

✓Point in time correctness is mandatory: offline joins use event timestamps to prevent data leakage, same transformation logic in batch and streaming paths prevents silent accuracy drops

✓Scale envelope: thousands of features, hundreds of models, millions of events per minute for streaming updates, multi month historical backfills at TB to PB scale

📌 Interview Tips

1Netflix Zipline manages thousands of features used by hundreds of personalization models, processes daily TB scale training sets with multi month backfills, maintains single digit to low tens of milliseconds p95 for online retrieval

2Uber Michelangelo ingests millions of events per minute for ETA and pricing models, achieves 5 to 20ms p95 online lookups, generates multi TB training sets with point in time joins to prevent leakage

3LinkedIn Feathr reduces time to production from weeks to days by ranking features by usage frequency and model performance attribution, integrates with Venice for single digit millisecond online reads

4Airbnb Bighead targets sub 100ms end to end inference for search ranking, allocates low tens of milliseconds p95 to feature retrieval via pre materialized stores and request coalescing

← Back to Feature Sharing & Discovery Overview