ML-Powered Search & RankingReal-time Search PersonalizationEasy⏱️ ~2 min

What is Real-Time Search Personalization?

Real-time search personalization adjusts ranked results at query time using immediate context and historical behavior, rather than computing personalized rankings in batch jobs hours or days earlier. When you search for hotels on Airbnb or products on Amazon, the system tailors the order of results to your preferences within milliseconds of your query, reflecting what you clicked just minutes ago. The architecture separates retrieval from ranking. Retrieval generates 500 to 5,000 candidates using lexical match and semantic similarity, typically spending 10 to 30 milliseconds on inverted indexes and 5 to 20 milliseconds on Approximate Nearest Neighbor (ANN) lookups across catalogs with 100 million items. Ranking then scores these candidates with a model that blends query relevance, item quality, and personalized signals, completing the entire flow within 50 to 150 milliseconds at the service boundary. Personalization operates on two time horizons. Short-term session profiles capture the last few interactions within a 15 to 30 minute window and encode current intent, like whether you are browsing budget hotels or luxury resorts right now. Long-term profiles encode stable preferences over 30 to 90 days, such as always booking pet friendly properties or preferring downtown locations. This dual approach balances immediate context with durable tastes. The tradeoff is complexity versus lift. Airbnb saw a 21 percent increase in Click Through Rate (CTR) for similar listings carousels and 4.9 percent improvement in booking pathways after deploying real-time personalization. However, this required building streaming pipelines, online feature stores, and sub millisecond feature computation. For low traffic products or narrow query domains, daily batch personalization with cached rankings might deliver 80 percent of the value at 20 percent of the engineering cost.
💡 Key Takeaways
Retrieval generates 500 to 5,000 candidates in 15 to 50 milliseconds using lexical indexes and ANN semantic search across catalogs with 100 million items
Ranking scores candidates with personalized signals in 3 to 10 milliseconds for 1,000 items using gradient boosted trees or shallow neural nets on CPU
Short-term profiles capture last 15 to 30 minutes of clicks to reflect current intent, while long-term profiles encode 30 to 90 day stable preferences
End to end latency must stay within 50 to 150 milliseconds at p95 to meet web search quality standards, forcing careful budgeting of every stage
Airbnb achieved 21 percent CTR increase and 4.9 percent booking improvement but required streaming pipelines, online feature stores, and sub millisecond feature computation
📌 Examples
Airbnb trained 32 dimensional listing embeddings from 800 million click sessions across 4.5 million listings and computes EmbClickSim and EmbSkipSim features online for each candidate in under 1 millisecond
Amazon search operates at tens of thousands of queries per second and uses per market sharding and feature collocation to keep p95 latency under 150 milliseconds
Google Search retrieves candidates via inverted indexes in 10 to 20 milliseconds and scores them with personalized rankers that complete in 30 to 50 milliseconds total
← Back to Real-time Search Personalization Overview