Query Execution and Ranking: From Postings to Top-K Results

The Query Execution Problem
You have an inverted index with 100 million documents. User searches "cheap pizza delivery." How do you turn that into the top 10 results in under 50 ms? The answer involves three steps: find matching documents, score them, and return the top K. Each step has optimizations that make millisecond responses possible at scale.
Finding Matching Documents
Look up each query term in the dictionary. "cheap" returns a postings list with 500,000 document IDs. "pizza" returns 200,000 documents. "delivery" returns 150,000 documents. For AND queries (documents containing ALL terms), intersect the lists. Efficient merge join on sorted lists, or skip list traversal, intersects million element lists in 5 to 10 ms. Result: 15,000 documents contain all three terms. For OR queries, union the lists: 700,000 documents contain at least one term.
Scoring with BM25
Not all matches are equal. A document mentioning "pizza" 10 times is more relevant than one mentioning it once. BM25 (Best Match 25) is the modern standard. It combines term frequency (TF, how often term appears in document), inverse document frequency (IDF, rare terms score higher), and document length normalization (short documents not unfairly penalized). Key parameters: k1 (typically 1.2 to 2.0) controls TF saturation to penalize keyword stuffing; b (typically 0.75) controls length normalization. Computing BM25 takes about 100 nanoseconds per term per document. For 15,000 candidates with 3 terms: 4.5 million operations in 5 to 10 ms.
Top K with Heap
Users want 10 results, not 15,000. Maintain a min heap of size K (typically 10 to 100). As you score each document, compare against heap minimum. If new score is higher, pop minimum and push new score. Heap operations are O(log K), so finding top 10 among 15,000 takes about 60,000 comparisons in under 1 ms. No need to sort all 15,000 results.
Early Termination with WAND
WAND (Weak AND) and MaxScore are clever optimizations. If postings lists are sorted by potential score contribution, you can stop early. Once your heap has K good results and remaining documents cannot possibly beat the minimum score, stop scanning. WAND maintains score upper bounds and skips entire postings blocks. This turns scanning 500,000 postings into scanning 20,000, reducing query time from 50 ms to 8 to 12 ms. CPU savings of 3x to 10x are typical for large disjunctions.

💡 Key Takeaways

✓List intersection for AND queries uses merge join or skip lists, intersecting million element lists in 5 to 10 ms

✓BM25 combines term frequency, inverse document frequency, and length normalization; k1 controls TF saturation, b controls length normalization

✓Top K retrieval uses min heap of size K with O(log K) operations; finding top 10 among 15,000 takes under 1 ms

✓WAND and MaxScore enable early termination by skipping postings blocks that cannot beat current top K; reduces CPU 3x to 10x

✓Typical latency budget: 2 to 3 ms dictionary lookup, 5 to 10 ms intersection, 5 to 10 ms scoring, 1 to 2 ms heap; total 15 to 25 ms

📌 Interview Tips

1Walk through the three steps: lookup returns postings, intersection reduces candidates, BM25 scores, heap extracts top K. Know the latency contribution of each.

2Explain why WAND works: if heap minimum is 5.0 and remaining documents have upper bound 4.0, stop scanning. This is the key early termination insight.

← Back to Inverted Index & Text Search Overview