BM25 vs Dense Retrieval: When to Use Each

Two Different Approaches
BM25 matches exact lexical terms: if query contains "pizza" and document contains "pizza", they match. Dense retrieval uses neural embeddings to match meanings: query "pizza" matches document about "Italian flatbread with toppings" even without word overlap. These approaches fail in complementary ways, making hybrid systems increasingly common.
Where BM25 Excels
Exact term matching shines for product codes, names, and identifiers. Search "XYZ-1234" and BM25 finds exactly that string. Dense retrieval might return similar codes like "XYZ-1235" because embeddings are close. Long tail and rare queries also favor BM25. Search "Karatsuba multiplication algorithm" works perfectly because exact terms appear in relevant documents. Dense retrieval trained on general text might not have meaningful embeddings for "Karatsuba". BM25 also supports instant updates: index a document and it becomes searchable immediately. Dense retrieval requires generating embeddings (50 to 200 ms per document) and updating vector index (batch operation).
Where Dense Retrieval Excels
Semantic matching handles vocabulary mismatch. Search "doctor" and find documents about "physician", "medical practitioner", or "healthcare provider". BM25 returns nothing without word overlap. Short ambiguous queries benefit from learned intent. Query "jaguar speed" with BM25 returns both animal and car; dense retrieval trained on user behavior learns most users want the car. Multilingual search becomes possible without translation: a dense model trained on parallel corpora matches English query with Spanish document.
Where Each Fails
BM25 fails completely on synonyms: no shared terms means no match regardless of semantic similarity. Dense retrieval fails on exact matches: embedding for "XYZ-1234" is essentially random noise, unhelpful for finding that specific string. These complementary failure modes justify hybrid approaches.
Hybrid Approach
Modern systems use both. Retrieve candidates with BM25 AND dense retrieval in parallel. Merge using reciprocal rank fusion (RRF): score = 1/(k + rank_bm25) + 1/(k + rank_dense), where k typically equals 60. Rerank combined list with cross encoder. This catches both exact and semantic matches. Start with BM25 alone (handles 70 to 80 percent of queries well); add dense retrieval when vocabulary mismatch becomes measured problem.

💡 Key Takeaways

✓BM25 matches exact lexical terms; dense retrieval matches semantic meanings via neural embeddings

✓BM25 excels at product codes, rare terms, and instant updates; dense retrieval requires 50 to 200 ms embedding generation

✓Dense retrieval handles synonyms (doctor finds physician) and multilingual; BM25 returns nothing without word overlap

✓BM25 fails on synonyms; dense fails on exact identifiers like XYZ-1234 where embedding is random noise

✓Hybrid uses RRF: score = 1/(k + rank_bm25) + 1/(k + rank_dense) with k equals 60; catches both exact and semantic matches

📌 Interview Tips

1Compare approaches: BM25 finds exact XYZ-1234; dense retrieval finds documents about doctor when query says physician. Each fails where other succeeds.

2Explain RRF: document ranked 1st by BM25 and 10th by dense gets score 1/61 + 1/70 equals 0.031. Document ranked 5th by both gets 1/65 + 1/65 equals 0.031. Equal.

← Back to Ranking Algorithms (TF-IDF, BM25) Overview