Natural Language Processing SystemsSemantic Search (Dense Embeddings, ANN)Medium⏱️ ~3 min

How ANN Algorithms Work: HNSW, IVF, and Scaling Strategies

The Scaling Problem

Finding the exact closest vectors requires comparing the query against every vector in your database. With 1 million documents at 768 dimensions, that is 768 million floating point operations per query. At 10 queries per second, you need 7.68 billion operations per second just for similarity computation. This does not scale.

Approximate Nearest Neighbor (ANN) algorithms solve this by trading a small amount of accuracy for massive speed improvements. Instead of finding the true top 10 closest vectors, ANN finds vectors that are very likely in the top 10 - typically 95%+ of the true nearest neighbors. The speed improvement is dramatic: milliseconds instead of seconds for million-scale datasets.

How ANN Indices Work

ANN algorithms pre-process your vectors into index structures that enable fast approximate search. HNSW (Hierarchical Navigable Small World) builds a graph where similar vectors are connected. Searching navigates this graph, jumping between connected nodes until reaching high-similarity regions. IVF (Inverted File) clusters vectors into buckets; at query time, only the most relevant buckets are searched.

Index building takes time - hours for millions of vectors. But once built, queries are fast. The index is essentially a precomputed map of vector neighborhoods that lets you skip most comparisons.

💡 Key Insight: ANN is a recall-latency trade-off. Configure indices for higher recall (search more candidates) and you get better accuracy but slower queries. Configure for speed and you might miss some true matches. Tune based on your acceptable recall threshold.

Choosing an ANN Algorithm

HNSW offers the best recall at high speed but uses significant memory (often 2-3x the raw vector storage). IVF uses less memory but requires more tuning. Product Quantization compresses vectors for massive scale (billions of vectors) but reduces accuracy. For most applications under 100 million vectors, HNSW is the default choice. Beyond that, hybrid approaches combining IVF and quantization become necessary.

💡 Key Takeaways
Exact nearest neighbor at 1M docs requires 768M ops per query - ANN trades small accuracy loss (95%+ recall) for millisecond queries
HNSW builds a graph connecting similar vectors; IVF clusters vectors into buckets - both enable searching a fraction of the data
ANN is a recall-latency trade-off: search more candidates for better accuracy, fewer for faster queries
HNSW is default for under 100M vectors; beyond that, IVF with quantization becomes necessary for memory constraints
📌 Interview Tips
1Lead with the math: 1M docs at 768 dims = 768M operations per query. At 10 QPS, that is 7.68B ops/sec - clearly not scalable.
2Explain HNSW intuitively: it is a graph where similar vectors are neighbors. Search hops between neighbors toward high-similarity regions.
3Frame the recall trade-off: 95% recall means 5% of true top-10 results might be missed. Ask if that is acceptable for the use case.
← Back to Semantic Search (Dense Embeddings, ANN) Overview