Feature Engineering & Feature StoresFeature Freshness & StalenessHard⏱️ ~3 min

Production Implementation: Metadata, Tiering, and Capacity Planning

Freshness Tier Metadata

Production feature stores tag each feature with a freshness tier and numeric SLA: realtime (p95 age under 5 seconds, p99 under 15 seconds), nearline (p95 under 5 minutes, p99 under 15 minutes), or batch (p95 under 24 hours, p99 under 48 hours). Each feature carries metadata: event time (when the underlying event occurred), last updated at (when the feature was computed and written), computation window (like 30 minute sliding window), soft TTL (warn threshold), and hard TTL (fallback threshold).

Feature Assembler Logic

The online feature assembler computes age at request time, enforces SLAs, and degrades gracefully. For each feature fetch, it calculates age = current_time minus event_time. If age exceeds soft TTL, it logs a warning and optionally appends age to the feature vector for model consumption. If age exceeds hard TTL, it substitutes a fallback value and increments an alert counter.

Capacity Planning

Starts with latency budgets. If total p99 inference latency is 50ms and model execution takes 30ms, you have 20ms for feature fetching, preprocessing, and network. At 5ms per feature lookup, you can afford 4 sequential hops. For 50 features, you need aggressive batching, parallel fetches, or caching to hit budget. DoorDash achieves this by bundling all features for an entity into single key lookups and caching hot entities.

Streaming Capacity

For nearline features, plan streaming cluster capacity based on peak events per second times compute per event times retention for late events. A 10,000 events per second stream with 100ms compute and 1 hour late event buffer requires approximately 10 partitions with 100MB state each. Add 50 to 100 percent headroom for traffic spikes and reprocessing after failures.

💡 Key Takeaways
Latency budgets force trade offs. With 50ms total p99 budget and 30ms model execution, feature retrieval has only 20ms. Batching 100 features by entity reduces round trips from 100 to 3, fitting in 15ms p99.
Write capacity must handle burst factors of 5x to 10x, not just averages. Uber provisions nearline stores for p99 load during peak hours, which can be 10x average load during events like New Year's Eve.
Cross region reads trade freshness for latency. Reading from primary region adds 50 to 150ms for cross continent latency but guarantees fresh data. Netflix reads embeddings locally (accepting 2 minute replication lag) but session state from primary.
Backfills in separate lanes prevent online poisoning. DoorDash routes 90 day historical recomputations to versioned offline stores. Only after validation do they promote to online serving, with guards against overwriting fresher values.
Feature metadata enables runtime decisions. Including last updated at and TTL lets the assembler substitute defaults, drop features, or include age as a model input when freshness SLAs are violated.
Monitoring replication lag is critical for geo distributed systems. LinkedIn tracks offset deltas between regions per feature store partition. When lag exceeds 5 minutes, alert and route critical reads to primary.
📌 Interview Tips
1Uber Michelangelo batches feature lookups by entity type. For a trip prediction, it fetches all rider features in one lookup (10ms), all driver features in another (8ms), and contextual features in a third (5ms), totaling 23ms p99 for 100+ features.
2Netflix maintains two tiers of feature storage: regional read replicas for user embeddings with 2 to 5 minute replication lag and primary region lookups for session state with 10ms p99 latency, choosing based on criticality.
3DoorDash discovered a backfill job overwrote fresh store busy signals with 3 hour old values during a nightly recomputation. Adding a version check (only write if new_version > current_version) prevented the regression.
← Back to Feature Freshness & Staleness Overview
Production Implementation: Metadata, Tiering, and Capacity Planning | Feature Freshness & Staleness - System Overflow