Recommendation SystemsRetrieval & Ranking PipelineEasy⏱️ ~2 min

What is a Retrieval and Ranking Pipeline?

Definition
Retrieval and Ranking Pipeline is a two stage architecture that first narrows billions of items to thousands (retrieval), then scores and orders those candidates (ranking) to produce recommendations.

THE CORE PROBLEM

With 100 million items, scoring each with a neural network at 5ms would take 500,000 seconds per request. Users expect results in 200ms. The pipeline solves this by splitting work into two phases with different computational budgets.

WHY TWO STAGES

Retrieval uses lightweight methods: approximate nearest neighbor search or inverted indexes. These scan millions of items in 10 to 50ms by sacrificing some accuracy, returning 1,000 to 10,000 candidates. Ranking applies expensive models with hundreds of features, spending 1 to 5ms per item. With 1,000 candidates parallelized across machines, ranking fits the latency budget.

THE FUNDAMENTAL TRADEOFF

Retrieval prioritizes recall (not missing good items) over precision. Missing a great item means it can never be ranked. Ranking prioritizes precision, ordering candidates so the best appear first. This division lets the system balance quality against latency.

💡 Key Insight: The pipeline is only as good as its weakest stage. A perfect ranker cannot recover items retrieval missed, and perfect retrieval is wasted if ranking orders items poorly.
💡 Key Takeaways
Two stage architecture: retrieval narrows billions to thousands, ranking orders the final results
Retrieval uses lightweight methods (ANN, inverted indexes) completing in 10-50ms across millions of items
Ranking applies complex models with hundreds of features, spending 1-5ms per candidate
Retrieval optimizes for recall (not missing good items), ranking optimizes for precision (correct ordering)
Pipeline quality is limited by its weakest stage - missed items in retrieval cannot be recovered
📌 Interview Tips
1When asked about latency budgets, mention typical splits: 50ms retrieval + 100ms ranking + 50ms network = 200ms total
2Discuss the recall vs precision tradeoff by stage - retrieval aims for 95%+ recall, ranking aims for precision@10
3Explain why single-stage systems fail: 100M items × 5ms = 500K seconds, impossible for real-time serving
← Back to Retrieval & Ranking Pipeline Overview