What is a Retrieval and Ranking Pipeline?
THE CORE PROBLEM
With 100 million items, scoring each with a neural network at 5ms would take 500,000 seconds per request. Users expect results in 200ms. The pipeline solves this by splitting work into two phases with different computational budgets.
WHY TWO STAGES
Retrieval uses lightweight methods: approximate nearest neighbor search or inverted indexes. These scan millions of items in 10 to 50ms by sacrificing some accuracy, returning 1,000 to 10,000 candidates. Ranking applies expensive models with hundreds of features, spending 1 to 5ms per item. With 1,000 candidates parallelized across machines, ranking fits the latency budget.
THE FUNDAMENTAL TRADEOFF
Retrieval prioritizes recall (not missing good items) over precision. Missing a great item means it can never be ranked. Ranking prioritizes precision, ordering candidates so the best appear first. This division lets the system balance quality against latency.