BM25 vs Dense Retrieval: When to Use Each
Two Different Approaches
BM25 matches exact lexical terms: if query contains "pizza" and document contains "pizza", they match. Dense retrieval uses neural embeddings to match meanings: query "pizza" matches document about "Italian flatbread with toppings" even without word overlap. These approaches fail in complementary ways, making hybrid systems increasingly common.
Where BM25 Excels
Exact term matching shines for product codes, names, and identifiers. Search "XYZ-1234" and BM25 finds exactly that string. Dense retrieval might return similar codes like "XYZ-1235" because embeddings are close. Long tail and rare queries also favor BM25. Search "Karatsuba multiplication algorithm" works perfectly because exact terms appear in relevant documents. Dense retrieval trained on general text might not have meaningful embeddings for "Karatsuba". BM25 also supports instant updates: index a document and it becomes searchable immediately. Dense retrieval requires generating embeddings (50 to 200 ms per document) and updating vector index (batch operation).
Where Dense Retrieval Excels
Semantic matching handles vocabulary mismatch. Search "doctor" and find documents about "physician", "medical practitioner", or "healthcare provider". BM25 returns nothing without word overlap. Short ambiguous queries benefit from learned intent. Query "jaguar speed" with BM25 returns both animal and car; dense retrieval trained on user behavior learns most users want the car. Multilingual search becomes possible without translation: a dense model trained on parallel corpora matches English query with Spanish document.
Where Each Fails
BM25 fails completely on synonyms: no shared terms means no match regardless of semantic similarity. Dense retrieval fails on exact matches: embedding for "XYZ-1234" is essentially random noise, unhelpful for finding that specific string. These complementary failure modes justify hybrid approaches.
Hybrid Approach
Modern systems use both. Retrieve candidates with BM25 AND dense retrieval in parallel. Merge using reciprocal rank fusion (RRF): score = 1/(k + rank_bm25) + 1/(k + rank_dense), where k typically equals 60. Rerank combined list with cross encoder. This catches both exact and semantic matches. Start with BM25 alone (handles 70 to 80 percent of queries well); add dense retrieval when vocabulary mismatch becomes measured problem.