Ranking Cascades: Trading Off Quality and Latency with Multi Stage Rankers
MULTI STAGE RANKING
A single ranker scoring thousands of candidates is too slow. Multi stage ranking uses progressively complex models. L1: logistic regression with 50 features scores 5,000 candidates in 10ms, passes top 500. L2: gradient boosted tree with 200 features takes 50ms, passes top 100. L3: deep neural network with 500+ features takes 50ms for final ordering. Total: 110ms instead of 500+ms for running neural network on all candidates.
STAGE DESIGN PRINCIPLES
Each stage must be more accurate than the previous. L1 needs 90%+ recall at its cutoff: 90% of items L3 would rank in top 100 should survive L1. Measure by running all candidates through all stages offline. If L1 recall drops below 85%, reduce cutoff aggression or improve the L1 model.
FEATURE COMPLEXITY BY STAGE
Early stages use precomputed features: item popularity, user segment, static embeddings. Later stages add expensive features: real time user activity, cross features between user and item, sequence models. Feature lookup cost matters: 5 to 10ms per database call works for 100 candidates, not 5,000.
SCORE CALIBRATION
Stage scores are not directly comparable. L1 outputs logits from negative 5 to 5, L2 outputs probabilities 0 to 1. Normalize to a common scale or use rank position for cross stage analysis.