Database Design • Search Databases (Elasticsearch, Solr)Easy⏱️ ~2 min
How Search Databases Use Inverted Indexes for Fast Retrieval
Search databases like Elasticsearch and Solr achieve sub 100ms query latency over billions of documents by using an inverted index, which maps terms to posting lists of document IDs. Instead of scanning every document to find matches, the system looks up the search term once and instantly retrieves all documents containing that term.
When you index a document, analyzers break text into tokens through tokenization, apply stemming (reducing "running" to "run"), expand synonyms, and normalize language specific variations. Each resulting term is stored in the inverted index pointing to the document ID. Numeric, date, and keyword fields get stored in columnar doc values structures for fast sorting and aggregations without loading original documents.
The write path uses an in memory segment that accumulates writes, then makes them visible via a refresh operation (typically every 1 second for near real time visibility). These segments eventually flush to disk and merge into larger immutable segments to reduce overhead. Query execution is distributed: a coordinator node scatters the query to relevant shards, each computes its local top k results using ranked retrieval algorithms like Best Matching 25 (BM25), then the coordinator merges and returns the final results.
LinkedIn's Galene system demonstrates this at scale, indexing hundreds of millions of member profiles with index freshness in single digit seconds and maintaining p99 latencies under a few hundred milliseconds while serving thousands of queries per second (QPS). Wikimedia's CirrusSearch serves tens of millions of Wikipedia pages with median latencies of 50 to 100ms.
💡 Key Takeaways
•Inverted indexes map each term to a list of document IDs, enabling instant term lookup instead of full document scans
•Analyzers transform text through tokenization, stemming, synonym expansion, and language normalization before indexing
•Writes land in memory segments, become visible after refresh (typically 1 second), then flush and merge to disk in larger immutable segments
•Doc values store numeric, date, and keyword fields in columnar format for efficient sorting and aggregations without loading source documents
•Query execution distributes across shards where each shard computes local top k matches, then coordinator merges results globally
•Production systems achieve sub 100ms median latency with near real time freshness (1 to 5 seconds) while handling thousands of QPS
📌 Examples
LinkedIn Galene: Indexes hundreds of millions of profiles, achieves single digit second freshness with p99 latency under 200ms at thousands of QPS
Wikimedia CirrusSearch: Serves tens of millions of Wikipedia pages with 50 to 100ms median latency across multiple languages using language specific analyzers