Failure Modes and Edge Cases in Inverted Index Systems

Hot Terms
Some terms appear in millions of documents. In a 100 million document corpus, "the" might appear in 95 million documents. A query containing "the" means scanning a 95 million entry postings list (roughly 400 MB of data), spiking latency from 20 ms to 500 to 1500 ms. Mitigations include stopword lists (filter common terms before querying), mandatory term intersection (require at least one rare term), and postings list caps (only store top 1 million most relevant documents for extremely common terms). Circuit breakers should reject queries that would scan more than a threshold (e.g., 10 million postings).
Segment Explosion
Each new batch of documents creates an index segment. With 1 second refresh and 1,000 documents per second, you create 86,400 segments daily. Each query must search all segments and merge results. Beyond 50 to 100 segments, latency degrades noticeably, adding 1 to 2 ms per additional 10 segments. Merging is IO intensive at 100 to 200 MB/s write throughput. Balance: allow 20 to 50 segments during peak traffic, then aggressively merge during off peak (2 to 6 AM) to consolidate down to 10 to 15 segments.
Skewed Sharding
If documents are not evenly distributed (one category has 60 percent of documents), that shard becomes a bottleneck. With a 200 ms latency SLA, the slow shard dominates p99. If 19 shards respond in 15 ms but the hot shard takes 180 ms, your p99 is 180 ms. Solutions: shard by document ID hash (guarantees uniform distribution) rather than category, or use more fine grained shards for hot categories.
Tombstone Bloat
Deleted documents are not physically removed immediately. They are marked with a tombstone and filtered at query time. If you delete 40 percent of documents without compacting, queries scan 1.7x more postings than necessary. Index size remains unchanged until compaction runs. Schedule compaction during low traffic periods. Force merge operations consolidate segments and remove deleted documents, typically taking 10 to 60 minutes per shard.
Memory Pressure
Term dictionaries and frequently accessed postings live in RAM via OS page cache. For a 500 GB index, you want at least 300 to 400 GB of RAM for good cache hit rates. If working set exceeds RAM, cache evictions force disk reads at 0.1 to 1 ms each. A query touching 100 cold postings lists adds 10 to 100 ms of IO latency. Monitor cache hit rates: below 90 percent indicates memory pressure. Add shards to distribute memory requirement before queries slow down.

💡 Key Takeaways

✓Hot term queries spike latency from 20 ms to 500 to 1500 ms when scanning 95 million postings; use stopwords, mandatory rare term, or postings caps

✓Segment explosion beyond 50 to 100 segments adds 1 to 2 ms per 10 segments; merge to 10 to 15 segments during off peak hours

✓Skewed sharding makes one shard a bottleneck dominating p99; shard by document ID hash for uniform distribution

✓Deleting 40 percent of documents without compaction causes 1.7x wasted scanning; schedule compaction during low traffic

✓For 500 GB index, need 300 to 400 GB RAM for 90 percent plus cache hit rate; below 90 percent indicates memory pressure

📌 Interview Tips

1Explain hot term mitigation: if query contains the, require at least one selective term, or use circuit breaker to reject queries scanning over 10 million postings.

2Discuss segment management: allow segment growth during peak (20 to 50), aggressively merge during off peak (down to 10 to 15) to balance freshness versus query performance.

← Back to Inverted Index & Text Search Overview