Embeddings & Similarity SearchApproximate Nearest Neighbors (FAISS, ScaNN, HNSW)Medium⏱️ ~2 min

Two Stage Retrieval: ANN Candidate Generation + Re-Ranking

WHY TWO STAGES

Real-world ranking needs signals that vectors cannot capture: user history, item freshness, business rules, real-time context. A pure vector search finds semantically similar items but misses these signals. Two-stage retrieval separates fast candidate generation (ANN) from expensive full ranking (ML model with rich features).

The first stage uses ANN to retrieve 200-1000 candidates at high recall (95-99%) within 5-20ms. The second stage applies a full ranking model with 50-200 features, including non-vector signals like click history, inventory status, and user segments. Final top-K (10-50) items returned to user.

CANDIDATE GENERATION (STAGE 1)

ANN retrieves candidates likely to be relevant based on embedding similarity. The key metric is recall: what fraction of the true top-K items appear in the candidate set? If true #1 item is not in candidates, no amount of re-ranking can surface it.

Over-retrieve to ensure coverage. If final output is top-10, retrieve 100-500 candidates. The ratio depends on how much re-ranking might reorder results. Highly personalized re-rankers need larger candidate sets.

Multiple retrieval paths: query embedding matches item embeddings, but also retrieve by user history similarity, trending items in category, and business-promoted items. Merge candidates from all paths.

RE-RANKING (STAGE 2)

Re-ranker is a full ML model (gradient boosted trees, neural network) with access to rich features. Feature categories:

User features: click history, purchase history, demographic segments, session context, real-time behavior.

Item features: popularity, recency, inventory, category, price, margin.

Cross features: user-item affinity scores, collaborative filtering signals, price sensitivity match.

Re-ranker computes score for each candidate, sorts, returns top-K. Latency budget: 20-50ms for 500 candidates. Requires efficient feature lookup and model inference.

LATENCY BUDGET ALLOCATION

Total budget: 100ms. Typical split: 10ms ANN retrieval, 20ms feature lookup, 30ms model inference, 40ms buffer for network and processing. If any stage exceeds budget, latency target is missed.

⚠️ Key Trade-off: Larger candidate sets improve final quality but increase re-ranking cost. Find the minimum candidate set size where further increases no longer improve final metrics.
💡 Key Takeaways
Two stages: ANN generates 200-1000 candidates fast, re-ranker scores with rich features
Stage 1 metric is recall—if true best item is not in candidates, it cannot be surfaced
Re-ranker uses features ANN cannot capture: user history, freshness, business rules
Latency budget split: ~10ms ANN, ~20ms features, ~30ms model, ~40ms buffer
📌 Interview Tips
1Interview Tip: Explain why pure vector search is insufficient—embedding similarity misses user history, freshness, business rules.
2Interview Tip: Describe candidate set sizing logic—retrieve 10-50x final output size, tune based on how much re-ranking changes order.
← Back to Approximate Nearest Neighbors (FAISS, ScaNN, HNSW) Overview
Two Stage Retrieval: ANN Candidate Generation + Re-Ranking | Approximate Nearest Neighbors (FAISS, ScaNN, HNSW) - System Overflow