Database Design • Search Databases (Elasticsearch, Solr)Hard⏱️ ~3 min
Hybrid Retrieval: Combining Lexical and Vector Search
Modern search systems increasingly combine lexical search (inverted indexes with BM25 scoring) and vector search (Approximate Nearest Neighbor or ANN indexes using algorithms like Hierarchical Navigable Small World or HNSW) to achieve both high precision filtering and semantic recall. This hybrid approach addresses the weaknesses of each method: lexical search excels at exact matching and structured filters but misses semantic similarity, while vector search captures meaning but struggles with precise keyword constraints.
A common two stage ranking pattern retrieves a few hundred to thousand candidates per shard using fast lexical and approximate vector search in stage one, then applies a richer Machine Learning (ML) model to globally re rank the merged results in stage two. Stage one prioritizes recall and speed (sub 50ms per shard), while stage two focuses on precision with acceptable added latency (additional 20 to 100ms for model inference).
Airbnb implements this for listing search where stage one combines lexical filters (location, price, amenities) with vector similarity for semantic intent, retrieving approximately 200 candidates per shard in under 50ms. Stage two re ranks these candidates using a gradient boosted model over 100 plus features, targeting end to end p95 under 200ms. The vector dimension is carefully capped (typically 128 to 512 dimensions) to control memory footprint, as ANN indexes can consume 2 to 5 times more memory than lexical indexes.
The critical failure mode is vector index memory pressure and recall degradation under load. HNSW indexes require significant Random Access Memory (RAM) for graph navigation, and approximate search trades recall for speed through parameters like ef (exploration factor) and M (graph connectivity). Under high load, reducing these parameters to meet latency Service Level Agreements (SLAs) can drop recall from 95% to 70%, degrading relevance. Mitigations include lexical backstops (ensure lexical matches always appear), per tenant vector dimension caps, and offline evaluation to set ANN parameters for target recall at expected Query Per Second (QPS).
💡 Key Takeaways
•Hybrid retrieval combines lexical search for precision and filters with vector search for semantic recall, addressing weaknesses of each approach
•Two stage ranking retrieves hundreds to thousands of candidates quickly (under 50ms per shard) in stage one, then applies costly ML re ranking in stage two (additional 20 to 100ms)
•Vector indexes using HNSW consume 2 to 5 times more memory than lexical indexes due to graph structures, requiring dimension caps (128 to 512) for production scale
•Approximate Nearest Neighbor (ANN) parameters like ef and M trade recall for speed; under load, reducing these can drop recall from 95% to 70%, degrading relevance
•Airbnb listing search combines lexical location and price filters with vector semantic matching in stage one, then re ranks with gradient boosted model in stage two for p95 under 200ms
•Failure mitigation includes lexical backstops to ensure exact matches appear, per tenant vector caps to control memory, and offline tuning of ANN parameters for target recall at expected QPS
📌 Examples
Airbnb listings: Stage 1 retrieves 200 candidates per shard (lexical + vector) in under 50ms, stage 2 re ranks with 100+ feature model, end to end p95 under 200ms
Vector memory scaling: 1 million 256 dimension vectors in HNSW requires approximately 4 to 8 GB RAM vs 1 GB for equivalent lexical index
Recall degradation under load: Reducing HNSW ef parameter from 200 to 50 cuts latency from 80ms to 30ms but drops recall from 95% to 75%, requiring lexical fallback