Feature Groups and Their Role in Ranking Systems

Definition
Feature groups in ranking systems organize features by their source and update frequency: query features, user features, item features, context features, and cross features. Each group has different latency constraints, storage patterns, and staleness tolerances.
Query Features: What The User Wants Now
Query features capture the current request: parsed query terms, detected intent (navigational, informational, transactional), query length, language, and query embeddings from a language model. These are computed at request time with strict latency budgets (typically <10ms). Since queries are unpredictable, you cannot pre-compute them. Query features are the foundation: they define what "relevant" means for this specific request.
User Features: Who Is Asking
User features capture historical behavior: click history, purchase history, category preferences, engagement patterns, demographic signals. These are pre-computed in batch (updated hourly to daily) and stored for fast lookup. User features enable personalization: two users with the same query get different rankings based on their history. Storage: user embeddings (256-512 dimensions), aggregated counts, preference vectors.
Item Features: What Is Being Ranked
Item features describe the candidates: titles, categories, prices, quality scores, popularity metrics, item embeddings. Static attributes (category, title) change rarely. Dynamic attributes (click rate, stock level) need frequent updates. Item features are pre-indexed: when a query arrives, you already know everything about each item. The challenge is keeping millions of item feature vectors fresh while serving at p99 <5ms.
Context and Cross Features
Context features: Time of day, device type, location, session length, referring source. Available at request time. Enable situational ranking (mobile users prefer shorter content; evening users browse more). Cross features: User-item interactions (has user clicked this item before?), query-item match signals (BM25 score, embedding similarity). Cross features often provide the strongest ranking signal but require combining user and item data at serving time.

💡 Key Takeaways

✓Five feature groups: query (computed at request), user (pre-computed hourly/daily), item (pre-indexed), context (request time), cross (combined at serving)

✓Query features define relevance for this request; computed in <10ms since queries are unpredictable

✓User features enable personalization: same query yields different rankings based on history

✓Item features must stay fresh across millions of items while serving at p99 <5ms

✓Cross features (user-item, query-item interactions) often provide strongest signal but require runtime combination

📌 Interview Tips

1When asked about feature engineering for ranking, start by categorizing features into groups based on source and update frequency

2Explain the latency constraints: query features <10ms (computed live), item features <5ms (pre-indexed lookup)

3Mention that cross features combine user and item data at serving time, making them powerful but compute-expensive

← Back to Feature Engineering for Ranking Overview