ML-Powered Search & Ranking • Feature Engineering for RankingMedium⏱️ ~3 min
Multi Resolution Time Windows and Feature Freshness
Behavioral features like Click Through Rate (CTR) or conversion rate change over time. An item trending today may have been unpopular last week. Maintaining only a single time window forces a choice: use a long window for stability and miss trends, or use a short window to catch trends but suffer from noise and variance. Production ranking systems solve this by computing the same feature at multiple resolutions, letting the model learn when to trust each.
A typical pattern maintains three to five windows per behavioral metric. For item CTR, compute over 1 hour, 6 hours, 1 day, 7 days, and 30 days. Additionally, apply exponential decay with half lives of 3 hours and 3 days to balance recent signals with accumulated history. At serving time, all versions are retrieved together in a single feature bundle. The ranker learns that 1 hour CTR matters more for trending news articles, while 30 day CTR is reliable for catalog items with stable demand. This adaptive weighting emerges from training data, not manual rules.
Freshness requirements vary by use case. Google Search needs query engagement signals within minutes to surface breaking news. Amazon requires inventory availability within 1 to 5 minutes to avoid promoting out of stock items. YouTube can tolerate 10 to 30 minute lag for most engagement features because video popularity changes gradually. Streaming pipelines maintain rolling aggregates using tumbling or sliding windows, writing updates to an online feature store that supports hundreds of thousands of writes per second with p99 read latency under 5 milliseconds.
The tradeoff is infrastructure cost versus relevance gain. Near real time features require streaming infrastructure, incremental computation, and higher write throughput to the feature store. For YouTube, refreshing video watch time every 10 minutes instead of every hour improves Click Through Rate (CTR) by approximately 2 to 3 percent but doubles compute cost for feature pipelines. Amazon justifies the cost because stale inventory causes customer frustration and lost sales. A content platform with stable catalogs may find daily batch updates sufficient, saving significant operational complexity.
💡 Key Takeaways
•Multi resolution features provide the same metric at 1 hour, 1 day, 7 days, and 30 days, letting models adaptively weight based on item type and context without manual rules
•Exponential decay with half lives of 3 hours and 3 days balances recent spikes with accumulated history, smoothing variance while remaining responsive to trends
•Freshness requirements are use case specific: Google needs minutes for breaking news, Amazon needs 1 to 5 minutes for inventory, YouTube tolerates 10 to 30 minutes for engagement
•Streaming pipelines maintain rolling aggregates with hundreds of thousands of writes per second to online feature stores with p99 read latency under 5 milliseconds
•Infrastructure tradeoff: YouTube gains 2 to 3 percent CTR improvement by refreshing watch time every 10 minutes instead of hourly, but doubles feature pipeline compute cost
•Point in time correctness matters for training: snapshot features at impression timestamp to avoid leakage from future events that bias offline metrics
📌 Examples
Google Search maintains query engagement signals with sub 10 minute freshness for breaking news, using streaming aggregation over Dataflow with writes to Bigtable backed feature store
Amazon updates product availability and Prime eligibility every 1 to 5 minutes via event sourced inventory streams, preventing out of stock items from ranking high and causing customer frustration
Airbnb computes listing booking rates with 1 day and 7 day windows daily via batch, but refreshes calendar availability and dynamic price changes within minutes through CDC streams from reservation database