Multi Source Retrieval: Combining Multiple Candidate Generators

MULTI SOURCE RETRIEVAL
Production systems run multiple retrievers in parallel and merge results. A typical setup includes: embedding retriever (items similar to user vector), collaborative filtering retriever (items liked by similar users), content retriever (matching user interests to item attributes), and trending retriever (currently popular items). Each returns 500 to 2,000 candidates, yielding 3,000 to 5,000 unique items after deduplication.
EMBEDDING RETRIEVAL MECHANICS
The system maintains vectors for users and items, typically 64 to 256 dimensions. Finding candidates means computing similarity between user vector and all item vectors. Approximate Nearest Neighbor algorithms (HNSW, FAISS) build graph structures that find top 1,000 items in 5 to 20ms, sacrificing about 5% recall versus exact search.
CANDIDATE MERGING
After retrievers return candidates, merge them for ranking. Simple: take the union. Smarter: weighted combination like 0.4 × embedding + 0.3 × cf + 0.2 × content + 0.1 × trending. Some systems enforce diversity quotas, ensuring 100+ candidates from each retriever reach ranking.
⚠️ Trade-off: More retrievers increase recall but add 10 to 30ms latency each. Most systems use 3 to 5 retrievers.
PRE-RANKING FILTERING
Before ranking, hard filters remove ineligible items: already interacted, wrong region, out of stock, age restricted, below quality threshold. Filtering removes 10 to 30% of candidates, saving ranking compute.

💡 Key Takeaways

✓Production systems use 3-5 parallel retrievers: embedding, collaborative filtering, content, trending

✓ANN algorithms (HNSW, FAISS) find top 1,000 similar items in 5-20ms with ~95% recall

✓Candidate merging uses weighted scores or diversity quotas to balance retriever contributions

✓Hard filters remove 10-30% of candidates before ranking (already seen, out of stock, wrong region)

✓Each additional retriever adds 10-30ms latency - balance coverage against response time

📌 Interview Tips

1Describe a concrete multi-retriever setup: embedding (2000 items) + CF (1000) + content (500) + trending (200) = 3700 candidates

2Explain why ANN is necessary: exact search over 100M items takes seconds, ANN takes milliseconds

3Mention filtering as a latency optimization - ranking 3000 items vs 4000 items saves 20-30% compute

← Back to Retrieval & Ranking Pipeline Overview