Query Execution and Ranking: From Postings to Top-K Results
The Query Execution Problem
You have an inverted index with 100 million documents. User searches "cheap pizza delivery." How do you turn that into the top 10 results in under 50 ms? The answer involves three steps: find matching documents, score them, and return the top K. Each step has optimizations that make millisecond responses possible at scale.
Finding Matching Documents
Look up each query term in the dictionary. "cheap" returns a postings list with 500,000 document IDs. "pizza" returns 200,000 documents. "delivery" returns 150,000 documents. For AND queries (documents containing ALL terms), intersect the lists. Efficient merge join on sorted lists, or skip list traversal, intersects million element lists in 5 to 10 ms. Result: 15,000 documents contain all three terms. For OR queries, union the lists: 700,000 documents contain at least one term.
Scoring with BM25
Not all matches are equal. A document mentioning "pizza" 10 times is more relevant than one mentioning it once. BM25 (Best Match 25) is the modern standard. It combines term frequency (TF, how often term appears in document), inverse document frequency (IDF, rare terms score higher), and document length normalization (short documents not unfairly penalized). Key parameters: k1 (typically 1.2 to 2.0) controls TF saturation to penalize keyword stuffing; b (typically 0.75) controls length normalization. Computing BM25 takes about 100 nanoseconds per term per document. For 15,000 candidates with 3 terms: 4.5 million operations in 5 to 10 ms.
Top K with Heap
Users want 10 results, not 15,000. Maintain a min heap of size K (typically 10 to 100). As you score each document, compare against heap minimum. If new score is higher, pop minimum and push new score. Heap operations are O(log K), so finding top 10 among 15,000 takes about 60,000 comparisons in under 1 ms. No need to sort all 15,000 results.
Early Termination with WAND
WAND (Weak AND) and MaxScore are clever optimizations. If postings lists are sorted by potential score contribution, you can stop early. Once your heap has K good results and remaining documents cannot possibly beat the minimum score, stop scanning. WAND maintains score upper bounds and skips entire postings blocks. This turns scanning 500,000 postings into scanning 20,000, reducing query time from 50 ms to 8 to 12 ms. CPU savings of 3x to 10x are typical for large disjunctions.