Embedding Based Similarity Features: EmbClickSim and EmbSkipSim

Definition
Embedding similarity features measure how similar a candidate item is to items the user has interacted with. Embeddings are numerical vectors (lists of numbers like [0.2, -0.5, 0.8...]) that capture item meaning. Similar items have similar vectors. Comparing vectors tells you if items are related.
EmbClickSim: Similarity to Clicked Items
For each candidate item, compute similarity between its vector and vectors of items the user clicked. Similarity is measured by cosine: how much the vectors point in the same direction (1.0 = identical direction, 0 = unrelated). Formula: EmbClickSim = max(similarity to each clicked item). High EmbClickSim means candidate resembles liked items. User clicked hiking boots → trail shoes, hiking poles score high.
EmbSkipSim: Similarity to Skipped Items
Skipped items (shown but not clicked) indicate negative preference. EmbSkipSim measures similarity to avoided items. High EmbSkipSim is negative: candidate resembles things user rejected. If user saw sandals and didn't click, similar sandals should score lower. Helps avoid showing more of what user already passed over.
Combining the Signals
The ranker combines both: boost = w1 × EmbClickSim - w2 × EmbSkipSim. Typical weights: w1 = 0.6-0.8 (clicks are strong positive), w2 = 0.2-0.4 (skips are weaker negative since users skip for many reasons). Items similar to clicks but dissimilar from skips get strongest boost.
Implementation
Optimization: instead of comparing against each click, use session embedding (average of click vectors). One comparison instead of N. For skips, sample last 10 rather than all. Pre-compute item vectors offline; real-time only does vector lookups and similarity math.

💡 Key Takeaways

✓EmbClickSim: similarity between candidate and clicked items; high value = candidate resembles liked items

✓EmbSkipSim: similarity to skipped items; high value = candidate resembles rejected items (negative signal)

✓Combined formula: personalization_boost = w1 × EmbClickSim - w2 × EmbSkipSim with typical weights 0.6-0.8 and 0.2-0.4

✓Optimization: use session embedding (average of clicks) instead of comparing against each click individually

✓Pre-compute item embeddings offline; real-time only does vector lookups and dot products

📌 Interview Tips

1Define both features: EmbClickSim = similarity to clicked items (positive), EmbSkipSim = similarity to skipped items (negative)

2Give the combination formula with typical weights: w1 around 0.6-0.8, w2 around 0.2-0.4

3Mention the optimization: session embedding (average) vs comparing against each click individually

← Back to Real-time Search Personalization Overview