Learn→Embeddings & Similarity Search→Approximate Nearest Neighbors (FAISS, ScaNN, HNSW)→4 of 6

Embeddings & Similarity Search • Approximate Nearest Neighbors (FAISS, ScaNN, HNSW)Medium⏱️ ~2 min

Two Stage Retrieval: ANN Candidate Generation + Re-Ranking

WHY TWO STAGES
Real-world ranking needs signals that vectors cannot capture: user history, item freshness, business rules, real-time context. A pure vector search finds semantically similar items but misses these signals. Two-stage retrieval separates fast candidate generation (ANN) from expensive full ranking (ML model with rich features).
The first stage uses ANN to retrieve 200-1000 candidates at high recall (95-99%) within 5-20ms. The second stage applies a full ranking model with 50-200 features, including non-vector signals like click history, inventory status, and user segments. Final top-K (10-50) items returned to user.
CANDIDATE GENERATION (STAGE 1)
ANN retrieves candidates likely to be relevant based on embedding similarity. The key metric is recall: what fraction of the true top-K items appear in the candidate set? If true #1 item is not in candidates, no amount of re-ranking can surface it.
Over-retrieve to ensure coverage. If final output is top-10, retrieve 100-500 candidates. The ratio depends on how much re-ranking might reorder results. Highly personalized re-rankers need larger candidate sets.
Multiple retrieval paths: query embedding matches item embeddings, but also retrieve by user history similarity, trending items in category, and business-promoted items. Merge candidates from all paths.
RE-RANKING (STAGE 2)
Re-ranker is a full ML model (gradient boosted trees, neural network) with access to rich features. Feature categories:
User features: click history, purchase history, demographic segments, session context, real-time behavior.
Item features: popularity, recency, inventory, category, price, margin.
Cross features: user-item affinity scores, collaborative filtering signals, price sensitivity match.
Re-ranker computes score for each candidate, sorts, returns top-K. Latency budget: 20-50ms for 500 candidates. Requires efficient feature lookup and model inference.
LATENCY BUDGET ALLOCATION
Total budget: 100ms. Typical split: 10ms ANN retrieval, 20ms feature lookup, 30ms model inference, 40ms buffer for network and processing. If any stage exceeds budget, latency target is missed.
⚠️ Key Trade-off: Larger candidate sets improve final quality but increase re-ranking cost. Find the minimum candidate set size where further increases no longer improve final metrics.

💡 Key Takeaways

✓Two stages: ANN generates 200-1000 candidates fast, re-ranker scores with rich features

✓Stage 1 metric is recall—if true best item is not in candidates, it cannot be surfaced

✓Re-ranker uses features ANN cannot capture: user history, freshness, business rules

✓Latency budget split: ~10ms ANN, ~20ms features, ~30ms model, ~40ms buffer

📌 Interview Tips

1Interview Tip: Explain why pure vector search is insufficient—embedding similarity misses user history, freshness, business rules.

2Interview Tip: Describe candidate set sizing logic—retrieve 10-50x final output size, tune based on how much re-ranking changes order.

← Back to Approximate Nearest Neighbors (FAISS, ScaNN, HNSW) Overview