Search & Ranking Systems • Inverted Index & Text SearchMedium⏱️ ~2 min
Query Execution and Ranking: From Postings to Top-K Results
Query execution combines postings lists from multiple terms using Boolean logic and selects the top K results using scoring algorithms. The two main execution strategies are document at a time (DAAT) and term at a time (TAAT). DAAT processes documents in ID order, maintaining heap or skip list iterators for each query term, computing scores incrementally as you advance through postings. This is memory efficient and enables early termination with dynamic pruning algorithms like Weak AND (WAND) or MaxScore, which skip postings blocks when their score upper bounds cannot beat the current top K.
Ranking typically starts with BM25 or Term Frequency Inverse Document Frequency (TF-IDF), which balance term frequency in documents against term rarity across the corpus. BM25 saturates term frequency to avoid over rewarding keyword stuffing and normalizes by document length to prevent bias toward long documents. Production systems add field boosts (title matches score higher than body), recency signals, and business constraints. Google combines inverted index retrieval with massive learned ranking models in later stages, keeping first stage retrieval under 50 milliseconds to leave headroom for deep reranking.
Phrase and proximity queries require positional postings. The engine first filters candidates using document level matches, then performs positional intersection to verify that terms appear at correct offsets. For the query "distributed systems", documents containing both terms are checked to ensure "distributed" appears immediately before "systems" at matching positions. This is slower than term only matching, which is why position storage is optional but critical for user facing search quality.
💡 Key Takeaways
•Document at a time (DAAT) execution with WAND or MaxScore pruning skips postings blocks when score upper bounds cannot beat current top K, reducing CPU by 3 to 10 times on large disjunctions with selective top K requirements
•BM25 saturates term frequency to penalize keyword stuffing and normalizes by document length. Typical parameters: k1 equals 1.2 to 2.0 controls term frequency saturation, b equals 0.75 controls length normalization impact
•Per shard versus global statistics trade off: global Inverse Document Frequency (IDF) yields consistent scoring but requires coordination or snapshots. Per shard IDF simplifies operations but rare terms common on one shard can dominate ranking there, causing result inconsistencies
•Two stage retrieval is standard at scale: inverted index returns a few hundred candidates in under 50 milliseconds, then learned rankers with thousands of features rerank within the overall latency budget of 200 to 500 milliseconds
•Phrase and proximity queries are 2 to 5 times slower than term only queries due to positional intersection. Optimize by filtering with doc level matches first, then verify positions only on reduced candidate sets
📌 Examples
Amazon retail search uses BM25 retrieval with field boosts (title weighted 3 times higher than description), then reranks top 200 candidates with business signals like availability, shipping speed, and click through rate (CTR) models within a 50 millisecond total budget
Google Web Search public talks indicate retrieval returns top K candidates in tens of milliseconds per shard using block max indexes and aggressive pruning, leaving 300 to 400 milliseconds for multi stage reranking with neural models
Elasticsearch default scoring uses BM25 with k1 equals 1.2 and b equals 0.75. Query "machine learning tutorial" retrieves docs containing any term (OR), scores each with BM25, and returns top 10. Phrase query "machine learning" requires positions and is measurably slower