Natural Language Processing SystemsSemantic Search (Dense Embeddings, ANN)Medium⏱️ ~3 min

Why Approximate Nearest Neighbor (ANN) and Core Algorithm Families

The Scaling Problem

Finding exact nearest neighbors requires comparing the query against every vector. With 1 million documents at 768 dimensions, that is 768 million floating point operations per query. At 10 queries per second, you need 7.68 billion ops/sec just for similarity. This does not scale. Approximate Nearest Neighbor (ANN) algorithms trade small accuracy loss for massive speed gains - finding 95%+ of true nearest neighbors in milliseconds instead of seconds.

HNSW (Hierarchical Navigable Small World)

HNSW builds a multi-layer graph where similar vectors connect as neighbors. The top layer has few, widely-spaced nodes for fast global navigation. Lower layers add more nodes for fine-grained local search. Query processing starts at top layer, greedily moves toward high-similarity regions, then descends layers for precision. HNSW offers excellent recall (98%+) with sub-millisecond queries, but uses 2-3x memory over raw vectors.

IVF (Inverted File Index)

IVF clusters vectors into buckets using k-means. At query time, find the closest cluster centroids, then search only vectors in those clusters. With 1000 clusters and searching the top 10, you examine 1% of vectors. Memory efficient but requires tuning: too few clusters means searching too many vectors, too many clusters means centroids are unreliable.

💡 Key Insight: HNSW is the default for under 100M vectors - best recall-speed trade-off. IVF shines at massive scale where HNSW memory becomes prohibitive. Product Quantization compresses vectors for billion-scale but sacrifices accuracy.

Choosing an Algorithm

Under 10M vectors: HNSW with default parameters. 10-100M vectors: HNSW with memory tuning or IVF-HNSW hybrid. 100M+ vectors: IVF with Product Quantization. Always benchmark on your data - published numbers rarely match real workloads.

💡 Key Takeaways
Exact nearest neighbor at 1M docs needs 768M ops per query - ANN trades 5% accuracy for 100-1000x speed
HNSW builds multi-layer graph for fast navigation: 98%+ recall, sub-ms queries, but 2-3x memory overhead
IVF clusters vectors into buckets, searches only relevant clusters - memory efficient but needs tuning
Default choice: HNSW under 100M vectors, IVF with quantization beyond that
📌 Interview Tips
1Lead with math: 1M docs at 768 dims = 768M ops per query. That frames why ANN is necessary.
2Explain HNSW as a multi-layer graph: top layers for global navigation, bottom layers for precision.
3Give concrete thresholds: HNSW under 100M, IVF beyond. Shows you know when to use what.
← Back to Semantic Search (Dense Embeddings, ANN) Overview