ML-Powered Search & RankingReal-time Search PersonalizationMedium⏱️ ~3 min

Embedding Based Similarity Features: EmbClickSim and EmbSkipSim

Definition
Embedding similarity features measure how similar a candidate item is to items the user has interacted with. Embeddings are numerical vectors (lists of numbers like [0.2, -0.5, 0.8...]) that capture item meaning. Similar items have similar vectors. Comparing vectors tells you if items are related.

EmbClickSim: Similarity to Clicked Items

For each candidate item, compute similarity between its vector and vectors of items the user clicked. Similarity is measured by cosine: how much the vectors point in the same direction (1.0 = identical direction, 0 = unrelated). Formula: EmbClickSim = max(similarity to each clicked item). High EmbClickSim means candidate resembles liked items. User clicked hiking boots → trail shoes, hiking poles score high.

EmbSkipSim: Similarity to Skipped Items

Skipped items (shown but not clicked) indicate negative preference. EmbSkipSim measures similarity to avoided items. High EmbSkipSim is negative: candidate resembles things user rejected. If user saw sandals and didn't click, similar sandals should score lower. Helps avoid showing more of what user already passed over.

Combining the Signals

The ranker combines both: boost = w1 × EmbClickSim - w2 × EmbSkipSim. Typical weights: w1 = 0.6-0.8 (clicks are strong positive), w2 = 0.2-0.4 (skips are weaker negative since users skip for many reasons). Items similar to clicks but dissimilar from skips get strongest boost.

Implementation

Optimization: instead of comparing against each click, use session embedding (average of click vectors). One comparison instead of N. For skips, sample last 10 rather than all. Pre-compute item vectors offline; real-time only does vector lookups and similarity math.

💡 Key Takeaways
EmbClickSim: similarity between candidate and clicked items; high value = candidate resembles liked items
EmbSkipSim: similarity to skipped items; high value = candidate resembles rejected items (negative signal)
Combined formula: personalization_boost = w1 × EmbClickSim - w2 × EmbSkipSim with typical weights 0.6-0.8 and 0.2-0.4
Optimization: use session embedding (average of clicks) instead of comparing against each click individually
Pre-compute item embeddings offline; real-time only does vector lookups and dot products
📌 Interview Tips
1Define both features: EmbClickSim = similarity to clicked items (positive), EmbSkipSim = similarity to skipped items (negative)
2Give the combination formula with typical weights: w1 around 0.6-0.8, w2 around 0.2-0.4
3Mention the optimization: session embedding (average) vs comparing against each click individually
← Back to Real-time Search Personalization Overview