Hybrid Retrieval: Combining Lexical and Vector Search

Lexical Search Strengths and Limitations
Lexical search using inverted indexes excels at exact matching, structured filters, and boolean queries. A search for city:"San Francisco" AND price:<500 executes in milliseconds because it combines posting list intersections with numeric range queries on doc values. BM25 scoring (Best Matching 25, a ranking function based on term frequency and document length normalization) ranks results by relevance, boosting documents where search terms appear frequently in short documents over sparse mentions in long documents.
However, lexical search fails on semantic similarity. A search for "affordable housing" will not match documents containing "budget apartments" or "cheap rentals" unless synonyms are explicitly configured. Users must guess the exact words in documents, and vocabulary mismatch causes relevant results to be missed. This limitation drove development of vector search for semantic understanding.
Vector Search for Semantic Understanding
Vector search represents documents and queries as high-dimensional vectors (arrays of numbers capturing semantic meaning) where similar concepts cluster nearby in vector space. A query for "affordable housing" and documents about "budget apartments" will have similar vectors even without shared words because neural networks learned their semantic relationship from training data. Distance metrics (cosine similarity, dot product) measure how closely vectors align.
Approximate Nearest Neighbor (ANN) algorithms enable fast vector search at scale. HNSW (Hierarchical Navigable Small World) builds a graph where each vector connects to nearby neighbors. Searching navigates this graph, starting from entry points and greedily moving toward closer vectors. This trades perfect recall (finding all true nearest neighbors) for speed: HNSW finds approximately 95% of true nearest neighbors in milliseconds rather than perfect results requiring exhaustive comparison.
Hybrid Retrieval Architecture
Modern search systems combine lexical and vector search in a two-stage ranking architecture. Stage one (candidate retrieval) prioritizes recall (the percentage of relevant documents retrieved) and speed, retrieving hundreds to thousands of candidates per shard using both methods in parallel. Lexical search handles exact matches and structured filters while vector search captures semantic similarity. Results merge using Reciprocal Rank Fusion (RRF, which combines rankings by summing reciprocal positions) or learned score combination.
Stage two (re-ranking) applies expensive machine learning models to the merged candidates. A gradient boosted model (an ensemble of decision trees trained sequentially) or neural ranker scores candidates using hundreds of features: text relevance, user history, item popularity, freshness, and business rules. Stage one targets under 50ms per shard, stage two adds 20 to 100ms for model inference, targeting total p95 (95th percentile latency) under 200ms.
Vector Index Memory and Recall Trade offs
HNSW indexes consume significantly more memory than lexical indexes due to graph overhead. One million 256-dimension vectors require roughly 4 to 8 GB RAM for HNSW versus 1 GB for equivalent lexical index. Production systems cap vector dimensions at 128 to 512 to control memory footprint, accepting reduced semantic resolution for operational feasibility.
HNSW parameters trade recall for speed: ef (exploration factor during search) and M (graph connectivity) control how thoroughly the algorithm explores. Under high load, reducing ef from 200 to 50 cuts latency from 80ms to 30ms but drops recall from 95% to 75%. Mitigation strategies include lexical backstops (ensure exact keyword matches always appear regardless of vector score) and offline tuning to set parameters achieving target recall at expected queries per second.

💡 Key Takeaways

✓Lexical search excels at exact matches and structured filters using BM25 scoring (ranking by term frequency and document length) but fails on semantic similarity

✓Vector search represents concepts as high-dimensional arrays where similar meanings cluster nearby, enabling semantic matching without shared keywords

✓HNSW (Hierarchical Navigable Small World) enables approximate nearest neighbor search by building navigable graphs, finding 95% of true neighbors in milliseconds

✓Two-stage ranking uses fast lexical plus vector retrieval (under 50ms per shard) followed by expensive ML re-ranking (20 to 100ms) for p95 under 200ms

✓HNSW indexes consume 4 to 8 GB RAM per million 256-dimension vectors versus 1 GB for lexical, requiring dimension caps at 128 to 512 for production

✓HNSW parameters ef and M trade recall for speed: reducing ef from 200 to 50 cuts latency 60% but drops recall from 95% to 75%

📌 Interview Tips

1Explain hybrid retrieval as combining lexical precision with vector recall: lexical catches exact matches, vector catches semantic near-misses

2Discuss two-stage architecture: stage one prioritizes recall and speed, stage two prioritizes precision with expensive models

3Mention vector memory trade offs: production systems cap dimensions to control memory, accepting reduced semantic resolution

← Back to Search Databases (Elasticsearch, Solr) Overview