Search & Ranking SystemsSearch Autocomplete (Trie)Medium⏱️ ~2 min

Ranking Signals and Personalization in Autocomplete

Frequency Alone Is Not Enough

If autocomplete ranked purely by global search frequency, every query would suggest the same top 10 terms. User types "ca": suggestions would always be globally most popular terms instead of contextually relevant options like "california" or "calculator". Ranking must blend multiple signals: frequency, recency, click through rate (CTR), personalization, and context.

Signal 1 Query Frequency

How often do users search for this term? A term searched 100,000 times daily ("amazon") outranks one searched 100 times ("amazonia"). Frequency data from query logs, aggregated hourly or daily. Raw counts are log scaled: log(frequency + 1) compresses dynamic range while preserving ordering, preventing viral trends from dominating.

Signal 2 Recency and Time Decay

A term trending in the last hour matters more than one popular last month. Apply time decay: e^(-lambda times age). With lambda equals 0.01 and age in hours, 24 hours retains 79 percent weight; 7 days retains only 19 percent. This surfaces trending terms quickly while maintaining baseline for stable popular terms.

Signal 3 Context

What was the previous query? If user searched "python", prefix "pa" should suggest "pandas" over "pancakes". Context windows of 1 to 5 previous queries improve relevance. Maintain per query pair co occurrence statistics, boost terms frequently following the recent query.

Signal 4 Personalization

A developer typing "py" wants "python"; a chef wants "pyrex". User history (past 30 to 90 days) creates personal prior. Blend: score = alpha times personal + (1-alpha) times global, alpha typically 0.2 to 0.4. Higher alpha risks filter bubbles. Balance personal signals 20 to 30 percent with global 70 to 80 percent for discovery.

Signal Combination

Combine multiplicatively: freq^0.5 times recency_decay times (1 + context_boost) times (1 + personal_boost). Square root on frequency prevents domination by popular terms. Tune via A/B tests measuring CTR; good autocomplete achieves 30 to 50 percent CTR on first suggestion.

💡 Key Takeaways
Composite ranking combines frequency, recency, CTR, personalization, and context for relevant suggestions
Log scale frequency with log(frequency + 1) to prevent viral trends from dominating
Time decay e^(-lambda times age); 24 hours retains 79 percent, 7 days retains 19 percent with lambda 0.01
Personalization: alpha times personal plus (1-alpha) times global; alpha 0.2 to 0.4 balances relevance and discovery
Good autocomplete achieves 30 to 50 percent CTR on first suggestion; tune weights via A/B tests
📌 Interview Tips
1Walk through the formula: freq^0.5 times recency times context times personal. Square root prevents popular term domination.
2Discuss filter bubbles: high alpha shows same suggestions. Balance personal 20 to 30 percent with global 70 to 80 percent.
← Back to Search Autocomplete (Trie) Overview
Ranking Signals and Personalization in Autocomplete | Search Autocomplete (Trie) - System Overflow