What is Real-Time Search Personalization?
Why Session Context Matters
A user searching for "python" could want the snake, the programming language, or Monty Python. Historical preferences help, but the current session tells you definitively. If they just clicked a coding tutorial, "python" means programming. If they came from a pet store page, it means snake. Real-time personalization uses these in-session signals to disambiguate intent within 10-50ms of the search.
Batch vs Real-Time Personalization
Batch personalization: Pre-computes user preferences overnight or hourly. Stores a static user profile (interests, categories, price ranges). Fast to serve but reflects who the user was hours ago, not who they are now. Real-time personalization: Updates the user's context with every click, view, and search within the session. Captures intent shifts (started browsing electronics, now looking at gifts). Requires streaming infrastructure to compute features in <50ms.
The Latency Challenge
Search has strict latency budgets: total response time under 200ms. Within that, personalization gets maybe 20-30ms. You must fetch user context, compute personalized features, blend them into the ranking score, and return results. The architecture uses pre-computed embeddings (user and item vectors stored for fast lookup) combined with real-time session features (last 5 clicks, current query). Heavy computation happens offline; real-time only does lightweight lookups and score adjustments.