What is Dense Retrieval with BERT Based Embeddings?
How It Works
Both queries and documents are encoded into fixed-length vectors (typically 768 dimensions for BERT-based models, 384-512 for efficient variants). Relevance is computed as dot product or cosine similarity between vectors. Documents are pre-indexed; at query time, encode the query and find nearest neighbors. The key insight: semantically similar texts cluster in vector space even without lexical overlap.
Dense vs Sparse Retrieval
Sparse (BM25, TF-IDF): Exact keyword matching. Fast, interpretable, handles rare terms well. Fails on synonyms, paraphrases, and semantic similarity. Dense: Semantic matching through learned embeddings. Handles synonyms and paraphrases. Struggles with rare/specific terms, exact matches, and requires training data. Neither dominates; production systems often combine both.
When Dense Retrieval Shines
Best for: semantic search where vocabulary varies ("inexpensive" vs "cheap"), question answering (question and answer rarely share words), multilingual search (embeddings bridge languages). Improves recall by 10-30% over sparse methods for semantic queries. Not ideal for: exact entity matching, code search, or domains with specialized vocabulary lacking training data.