Feature Groups and Their Role in Ranking Systems
Query Features: What The User Wants Now
Query features capture the current request: parsed query terms, detected intent (navigational, informational, transactional), query length, language, and query embeddings from a language model. These are computed at request time with strict latency budgets (typically <10ms). Since queries are unpredictable, you cannot pre-compute them. Query features are the foundation: they define what "relevant" means for this specific request.
User Features: Who Is Asking
User features capture historical behavior: click history, purchase history, category preferences, engagement patterns, demographic signals. These are pre-computed in batch (updated hourly to daily) and stored for fast lookup. User features enable personalization: two users with the same query get different rankings based on their history. Storage: user embeddings (256-512 dimensions), aggregated counts, preference vectors.
Item Features: What Is Being Ranked
Item features describe the candidates: titles, categories, prices, quality scores, popularity metrics, item embeddings. Static attributes (category, title) change rarely. Dynamic attributes (click rate, stock level) need frequent updates. Item features are pre-indexed: when a query arrives, you already know everything about each item. The challenge is keeping millions of item feature vectors fresh while serving at p99 <5ms.
Context and Cross Features
Context features: Time of day, device type, location, session length, referring source. Available at request time. Enable situational ranking (mobile users prefer shorter content; evening users browse more). Cross features: User-item interactions (has user clicked this item before?), query-item match signals (BM25 score, embedding similarity). Cross features often provide the strongest ranking signal but require combining user and item data at serving time.