Ranking Cascades: Trading Off Quality and Latency with Multi Stage Rankers

MULTI STAGE RANKING
A single ranker scoring thousands of candidates is too slow. Multi stage ranking uses progressively complex models. L1: logistic regression with 50 features scores 5,000 candidates in 10ms, passes top 500. L2: gradient boosted tree with 200 features takes 50ms, passes top 100. L3: deep neural network with 500+ features takes 50ms for final ordering. Total: 110ms instead of 500+ms for running neural network on all candidates.
STAGE DESIGN PRINCIPLES
Each stage must be more accurate than the previous. L1 needs 90%+ recall at its cutoff: 90% of items L3 would rank in top 100 should survive L1. Measure by running all candidates through all stages offline. If L1 recall drops below 85%, reduce cutoff aggression or improve the L1 model.
FEATURE COMPLEXITY BY STAGE
Early stages use precomputed features: item popularity, user segment, static embeddings. Later stages add expensive features: real time user activity, cross features between user and item, sequence models. Feature lookup cost matters: 5 to 10ms per database call works for 100 candidates, not 5,000.
💡 Key Insight: The cascade is a compute budget problem. Spending 150ms on a complex model for few candidates often beats a simple model for many candidates, if early stages maintain high recall.
SCORE CALIBRATION
Stage scores are not directly comparable. L1 outputs logits from negative 5 to 5, L2 outputs probabilities 0 to 1. Normalize to a common scale or use rank position for cross stage analysis.

💡 Key Takeaways

✓Multi-stage ranking: L1 (simple, 5000→500) → L2 (medium, 500→100) → L3 (complex, 100→final)

✓Total cascade time ~110ms vs 500+ms for running complex model on all candidates

✓Each stage must maintain 85-90%+ recall of items the final stage would rank highly

✓Early stages use precomputed features, later stages add real-time and cross features

✓Feature lookup cost matters: 5-10ms per DB call is acceptable for 100 candidates, not 5000

📌 Interview Tips

1Walk through a concrete cascade: L1 logistic regression (10ms), L2 GBDT (50ms), L3 neural net (50ms)

2Explain recall measurement: run all candidates through all stages offline, compare to ground truth

3Discuss feature cost: real-time user activity adds 10ms per lookup, only feasible in later stages

← Back to Retrieval & Ranking Pipeline Overview