Why Approximate Nearest Neighbor (ANN) and Core Algorithm Families
The Scaling Problem
Finding exact nearest neighbors requires comparing the query against every vector. With 1 million documents at 768 dimensions, that is 768 million floating point operations per query. At 10 queries per second, you need 7.68 billion ops/sec just for similarity. This does not scale. Approximate Nearest Neighbor (ANN) algorithms trade small accuracy loss for massive speed gains - finding 95%+ of true nearest neighbors in milliseconds instead of seconds.
HNSW (Hierarchical Navigable Small World)
HNSW builds a multi-layer graph where similar vectors connect as neighbors. The top layer has few, widely-spaced nodes for fast global navigation. Lower layers add more nodes for fine-grained local search. Query processing starts at top layer, greedily moves toward high-similarity regions, then descends layers for precision. HNSW offers excellent recall (98%+) with sub-millisecond queries, but uses 2-3x memory over raw vectors.
IVF (Inverted File Index)
IVF clusters vectors into buckets using k-means. At query time, find the closest cluster centroids, then search only vectors in those clusters. With 1000 clusters and searching the top 10, you examine 1% of vectors. Memory efficient but requires tuning: too few clusters means searching too many vectors, too many clusters means centroids are unreliable.
Choosing an Algorithm
Under 10M vectors: HNSW with default parameters. 10-100M vectors: HNSW with memory tuning or IVF-HNSW hybrid. 100M+ vectors: IVF with Product Quantization. Always benchmark on your data - published numbers rarely match real workloads.