Embeddings & Similarity SearchEmbedding Generation (BERT, Sentence-BERT, Graph Embeddings)Hard⏱️ ~3 min

Two Stage Retrieval: Candidate Generation and Re-ranking at Scale

WHY TWO STAGES

Embedding search is fast but approximate—it captures semantic similarity but misses personalization, freshness, and business rules. Rich ML models are accurate but slow—they can incorporate hundreds of features but cannot score millions of items. Two-stage retrieval combines both strengths.

Stage 1 (retrieval): ANN search returns 100-1000 candidates in 5-20ms. The metric is recall—did we retrieve items that are truly relevant? This stage filters millions down to hundreds.

Stage 2 (re-ranking): An ML model scores each candidate using rich features (user history, item popularity, price, freshness). Returns final top 10-50. The metric is precision and NDCG—did we rank the relevant items correctly?

CANDIDATE SET SIZING

Retrieve enough candidates that true positives are included. If final output is top-10 recommendations, retrieve 100-500 candidates. The exact ratio depends on embedding quality and re-ranker power.

Under-retrieval risk: true positives missing from candidate set. The re-ranker cannot surface items it never sees—recall is capped at Stage 1. Over-retrieval cost: re-ranker latency increases linearly with candidate count. Find the minimum candidate set that maintains target recall.

RE-RANKER FEATURES

User features: click history, purchase history, demographic segments, session context, device type.

Item features: popularity score, days since creation, price tier, category, inventory status.

Cross features: user-item affinity scores, collaborative filtering signals, price sensitivity match—personalization signals that embeddings cannot capture.

💡 Key Insight: Embeddings capture semantic similarity. Re-rankers add business logic, personalization, freshness, and inventory awareness. Both stages are necessary for high-quality production recommendations.
💡 Key Takeaways
Stage 1 (ANN): 100-1000 candidates in 5-20ms, optimizes for recall
Stage 2 (re-ranker): scores with rich features, optimizes for precision/NDCG
Candidate set sizing: enough to include true positives, minimum for latency
📌 Interview Tips
1Interview Tip: Explain why two stages—embeddings are fast but miss personalization; re-rankers add business logic.
2Interview Tip: Describe candidate set sizing trade-off—under-retrieval caps recall, over-retrieval hurts latency.
← Back to Embedding Generation (BERT, Sentence-BERT, Graph Embeddings) Overview