Feature Engineering & Feature StoresFeature Freshness & StalenessEasy⏱️ ~2 min

What is Feature Freshness and Why Does It Matter?

Definition
Feature freshness is the age of a feature value relative to when it is used for prediction, calculated as the current time minus the event time that produced the feature. When this age exceeds an agreed SLA, the feature is considered stale.

Example

If a fraud detection feature showing "number of transactions in last 5 minutes" was computed 3 minutes ago, its freshness is 3 minutes. The 3 minute delay may or may not be acceptable depending on your SLA.

Freshness Requirements by Use Case

Fraud signals and live inventory at companies like Uber must have p95 freshness under 5 to 10 seconds because stale data leads to incorrect pricing or fraud going undetected. In contrast, user embeddings or long term purchase history at Netflix can tolerate 24 hour staleness since they capture stable patterns.

Sensitivity Analysis

The impact of staleness must be measured experimentally. LinkedIn found that for real time ranking, features representing "clicks in last hour" degraded Click Through Rate (CTR) by 3% when served 5 minutes stale, while "lifetime click count" showed no measurable impact even with 24 hour staleness.

Trade-off Triangle

Freshness trades directly against latency and cost. Achieving sub second freshness requires streaming infrastructure with always on compute, adding 10 to 50x cost compared to hourly batch updates. The engineering question is: what freshness SLA does each feature need to maintain model quality, and what is the minimum cost to achieve it?

💡 Key Takeaways
Freshness is calculated as now minus event time, not processing time. A feature computed from a 10 minute old event is 10 minutes stale even if computation just finished.
Uber marketplace inference runs at over 100k Queries Per Second (QPS) globally during peaks with only 20 to 50ms total prediction budget, leaving 5 to 15ms p99 for feature retrieval.
Staleness harms business metrics measurably. DoorDash found that delivery time predictions degrade significantly when store busy state features exceed 60 seconds of age during peak hours.
Most production systems define three tiers: realtime (p95 under 5 seconds), nearline (p95 under 5 minutes), and batch (p95 under 24 hours) with different infrastructure for each.
Monitoring must track distributions, not averages. A p50 freshness of 2 seconds with p99 of 5 minutes means 1% of predictions use critically stale data, causing bad user experiences.
Freshness requirements should be validated through A/B testing. Netflix only pushes features to real time infrastructure when experiments prove that reducing staleness improves Click Through Rate (CTR) or engagement.
📌 Interview Tips
1Uber dynamic pricing uses features like nearby driver supply with p95 freshness under 10 seconds. If this goes stale by 5 minutes during rush hour, surge multipliers become inaccurate and drivers are misallocated.
2LinkedIn feed ranking combines daily batch user embeddings (24 hour staleness acceptable) with nearline engagement signals (updated within 60 seconds) to balance freshness and cost.
3Netflix homepage ranking accepts 24 hour staleness for heavy recommendation embeddings while computing context features like time of day and device type at request time for sub 50ms latency.
← Back to Feature Freshness & Staleness Overview