Production Implementation: Metadata, Tiering, and Capacity Planning
Freshness Tier Metadata
Production feature stores tag each feature with a freshness tier and numeric SLA: realtime (p95 age under 5 seconds, p99 under 15 seconds), nearline (p95 under 5 minutes, p99 under 15 minutes), or batch (p95 under 24 hours, p99 under 48 hours). Each feature carries metadata: event time (when the underlying event occurred), last updated at (when the feature was computed and written), computation window (like 30 minute sliding window), soft TTL (warn threshold), and hard TTL (fallback threshold).
Feature Assembler Logic
The online feature assembler computes age at request time, enforces SLAs, and degrades gracefully. For each feature fetch, it calculates age = current_time minus event_time. If age exceeds soft TTL, it logs a warning and optionally appends age to the feature vector for model consumption. If age exceeds hard TTL, it substitutes a fallback value and increments an alert counter.
Capacity Planning
Starts with latency budgets. If total p99 inference latency is 50ms and model execution takes 30ms, you have 20ms for feature fetching, preprocessing, and network. At 5ms per feature lookup, you can afford 4 sequential hops. For 50 features, you need aggressive batching, parallel fetches, or caching to hit budget. DoorDash achieves this by bundling all features for an entity into single key lookups and caching hot entities.
Streaming Capacity
For nearline features, plan streaming cluster capacity based on peak events per second times compute per event times retention for late events. A 10,000 events per second stream with 100ms compute and 1 hour late event buffer requires approximately 10 partitions with 100MB state each. Add 50 to 100 percent headroom for traffic spikes and reprocessing after failures.