Search & Ranking SystemsInverted Index & Text SearchMedium⏱️ ~3 min

Trade-offs in Inverted Index Design: Freshness, Size, and Query Speed

Inverted index design involves several fundamental trade offs that impact operational cost, query performance, and user experience. The freshness versus write amplification trade off is central: near real time indexing with frequent small segments (committed every 1 to 5 seconds) yields fast visibility but creates many segments, triggering frequent merges that consume CPU and IO. This write amplification can be 5 to 10 times the raw write rate. Batch indexing reduces merges and boosts query throughput by maintaining fewer, larger segments, but increases staleness to minutes or hours, unacceptable for user generated content or rapidly changing inventory. Query speed versus index size is another critical decision. Storing positional postings enables phrase queries, proximity search, and accurate snippet highlighting, but increases index size by 1.5 to 3 times versus document only postings. If your workload is purely Boolean keyword matching (like filtering logs or structured data), omit positions to save 60 to 70 percent of postings space and speed queries. For user facing search where phrase queries and snippets matter, positions are mandatory despite the cost. Recall versus latency trade offs arise with synonym expansion, wildcard matching, and fuzzy search. Expanding "TV" to "television, telly, tele" increases recall but multiplies candidate sets, spiking latency. Use aggressive pruning like WAND, limit expansions to top N synonyms (e.g., 5 to 10), or employ two stage retrieval: tight first stage BM25 with minimal expansion under 50 milliseconds, then rerank with richer features and expansions in a second stage with more time budget.
💡 Key Takeaways
Near real time indexing with 1 to 5 second commit intervals enables visibility latency under 10 seconds but causes write amplification of 5 to 10 times due to frequent merges. Batch indexing reduces amplification to 2 to 3 times but increases staleness to minutes or hours
Positional postings increase index size by 1.5 to 3 times and reduce query throughput by 20 to 40 percent versus doc only postings. Only include positions if phrase queries, proximity search, or snippets are required; omit for Boolean keyword matching to save 60 to 70 percent space
Global term statistics yield stable scoring but require coordination or periodic snapshots, adding operational complexity. Per shard statistics simplify operations but rare terms common on one shard can dominate ranking there, causing inconsistent results across queries
Synonym expansion, wildcard, and fuzzy matching increase recall but multiply candidate sets by 2 to 10 times, spiking tail latency. Limit expansions to top 5 to 10 synonyms and use two stage retrieval: tight first stage under 50 milliseconds, then rerank with expansions
Sharding by document is simpler and balances write load but forces queries to fan out to all shards. Term based sharding localizes hot terms but complicates multi term queries and updates; rarely used except for specialized workloads with extreme skew
📌 Examples
Amazon retail search omits positions on some backend analytics indexes used for filtering and faceting, saving 70 percent postings space, but includes positions on user facing search indexes to support phrase queries like "usb c cable" and generate accurate snippets
Elasticsearch default refresh interval is 1 second for near real time search, but high write workloads often increase this to 30 seconds or batch mode to reduce merge load and improve indexing throughput from 10,000 to 50,000 documents per second
Google Web Search uses per shard IDF approximations calibrated with periodic global stat snapshots to balance scoring consistency and operational simplicity, avoiding real time coordination across trillions of documents
← Back to Inverted Index & Text Search Overview