ML-Powered Search & Ranking • Feature Engineering for RankingEasy⏱️ ~3 min
Feature Groups and Their Role in Ranking Systems
Ranking features predict which items should appear higher in search results for a given query and user. Unlike classification tasks that predict a single label, ranking optimizes the relative order of hundreds or thousands of candidates. The feature space captures four dimensions: the query itself, the user making the request, the item being ranked, and the context of the session.
Features organize into natural groups that work together. Relevance features measure how well an item matches the query through lexical signals like exact text match and semantic signals from embeddings. Engagement features track historical behavior including clicks, watch time, add to cart actions, and final conversions. Quality features assess item trust, completeness, and seller reputation. Personalization features align items with user preferences from browsing history and past purchases. Business features enforce constraints like inventory availability, Prime eligibility, or content policy compliance.
The learning objective shapes which features matter most. Pointwise losses that predict absolute relevance scores favor well calibrated features that work independently. Pairwise losses that compare two items stress discriminative signals that clearly separate good from bad candidates. Listwise losses that optimize entire result lists need features robust to position bias and candidate correlation. Production systems typically combine all three: sparse engineered signals like text match scores, dense semantic embeddings for generalization, and behavioral aggregates computed over multiple time windows.
Real systems use 100 to 300 features per candidate. Google Search combines authority signals computed over months with fresh query engagement from the last hour. Amazon emphasizes constraint features updated within 1 to 5 minutes, like inventory counts and shipping speed. YouTube balances long term video quality scores with immediate watch time and satisfaction signals. The key insight is that no single feature type wins alone. Strong rankers compose signals across time scales, entity hierarchies, and interaction types.
💡 Key Takeaways
•Ranking optimizes relative order across candidates, requiring features that capture query, user, item, and context interactions rather than just item properties alone
•Production systems use 100 to 300 features per candidate, organized into relevance, engagement, quality, personalization, and business constraint groups
•Pointwise objectives favor calibrated individual features, pairwise objectives need discriminative signals, and listwise objectives require robustness to position bias
•Multi resolution time windows are essential: Google combines authority signals from months with engagement from the last hour, Amazon refreshes inventory within 1 to 5 minutes
•Dense semantic embeddings generalize to tail queries and new items, while sparse engineered features provide interpretable and stable signals at lower serving cost
📌 Examples
Amazon product search uses constraint features (in stock, Prime eligible, delivery speed) updated every 1 to 5 minutes, combined with precomputed quality scores and real time click through rates over 1 hour and 7 day windows
YouTube ranking computes video quality embeddings daily, maintains watch time aggregates with 1 hour and 7 day windows, and uses two tower retrieval to narrow tens of thousands of candidates to hundreds for the deep ranker
Airbnb search precomputes listing quality embeddings daily, tracks rolling booking rates with 1 day and 7 day windows, and refreshes calendar availability and dynamic pricing within minutes