ML-Powered Search & RankingFeature Engineering for RankingEasy⏱️ ~3 min

Feature Groups and Their Role in Ranking Systems

Definition
Feature groups in ranking systems organize features by their source and update frequency: query features, user features, item features, context features, and cross features. Each group has different latency constraints, storage patterns, and staleness tolerances.

Query Features: What The User Wants Now

Query features capture the current request: parsed query terms, detected intent (navigational, informational, transactional), query length, language, and query embeddings from a language model. These are computed at request time with strict latency budgets (typically <10ms). Since queries are unpredictable, you cannot pre-compute them. Query features are the foundation: they define what "relevant" means for this specific request.

User Features: Who Is Asking

User features capture historical behavior: click history, purchase history, category preferences, engagement patterns, demographic signals. These are pre-computed in batch (updated hourly to daily) and stored for fast lookup. User features enable personalization: two users with the same query get different rankings based on their history. Storage: user embeddings (256-512 dimensions), aggregated counts, preference vectors.

Item Features: What Is Being Ranked

Item features describe the candidates: titles, categories, prices, quality scores, popularity metrics, item embeddings. Static attributes (category, title) change rarely. Dynamic attributes (click rate, stock level) need frequent updates. Item features are pre-indexed: when a query arrives, you already know everything about each item. The challenge is keeping millions of item feature vectors fresh while serving at p99 <5ms.

Context and Cross Features

Context features: Time of day, device type, location, session length, referring source. Available at request time. Enable situational ranking (mobile users prefer shorter content; evening users browse more). Cross features: User-item interactions (has user clicked this item before?), query-item match signals (BM25 score, embedding similarity). Cross features often provide the strongest ranking signal but require combining user and item data at serving time.

💡 Key Takeaways
Five feature groups: query (computed at request), user (pre-computed hourly/daily), item (pre-indexed), context (request time), cross (combined at serving)
Query features define relevance for this request; computed in <10ms since queries are unpredictable
User features enable personalization: same query yields different rankings based on history
Item features must stay fresh across millions of items while serving at p99 <5ms
Cross features (user-item, query-item interactions) often provide strongest signal but require runtime combination
📌 Interview Tips
1When asked about feature engineering for ranking, start by categorizing features into groups based on source and update frequency
2Explain the latency constraints: query features <10ms (computed live), item features <5ms (pre-indexed lookup)
3Mention that cross features combine user and item data at serving time, making them powerful but compute-expensive
← Back to Feature Engineering for Ranking Overview