Embeddings & Similarity SearchApproximate Nearest Neighbors (FAISS, ScaNN, HNSW)Easy⏱️ ~2 min

What is Approximate Nearest Neighbor (ANN) Search?

Definition
Approximate Nearest Neighbor (ANN) search finds vectors "close enough" to a query vector in high-dimensional space, trading small accuracy losses (finding 95% of true nearest neighbors instead of 100%) for massive speedups (milliseconds instead of hours).

THE EXACT SEARCH PROBLEM

Exact nearest neighbor search compares the query against every vector in the dataset. For 10 million 768-dimensional vectors, that is 10M × 768 = 7.68 billion floating point operations per query. At 100 GFLOPS, that takes 77 milliseconds per query. For real-time applications needing <10ms latency, exact search is impossible at scale.

The math gets worse as datasets grow. 1 billion vectors takes 7.7 seconds per query with exact search. Batch processing overnight might work, but real-time recommendations need sub-10ms latency. You need algorithms that examine only a tiny fraction of vectors while still finding most true neighbors.

HOW ANN ALGORITHMS WORK

ANN algorithms use index structures to skip most comparisons. Instead of O(N) comparisons, they achieve O(log N) or O(√N) by organizing vectors into hierarchies, clusters, or graphs. The index guides search toward promising regions, pruning most of the search space.

Clustering-based (IVF): Partition vectors into clusters. At query time, find nearest cluster centroids, search only within those clusters. Checking 20 clusters out of 1000 reduces work by 50x.

Graph-based (HNSW): Build a navigable graph where nearby vectors are connected. Walk the graph from entry point toward query, following edges to closer neighbors. Typically examines 100-300 nodes to find top-10 in a million-vector index.

Quantization-based (PQ): Compress vectors to reduce memory and speed up distance computation. 768-dim float32 (3KB) becomes 64-byte code. Distance computed via lookup tables instead of full dot product.

RECALL VS LATENCY TRADEOFF

Recall@K: What fraction of the true K nearest neighbors does ANN return? Recall@10 of 0.95 means 9.5 of the true top-10 are found on average.

Tighter search (more clusters probed, more graph nodes visited) improves recall but increases latency. Typical production targets: recall@10 ≥ 0.95 with latency <10ms. You tune index parameters to hit this balance for your data distribution.

⚠️ Key Trade-off: Recall and latency are directly linked. Increasing recall from 90% to 99% might double latency. Choose the minimum recall your application can tolerate.
💡 Key Takeaways
ANN trades accuracy for speed: find 95% of true neighbors in milliseconds vs hours for exact search
Exact search is O(N) comparisons; ANN achieves O(log N) or O(√N) via index structures
Three main approaches: clustering (IVF), graphs (HNSW), and compression (PQ)
Recall@K measures quality: 0.95 means 9.5 of true top-10 found on average
📌 Interview Tips
1Interview Tip: Explain why exact search fails at scale—10M vectors × 768 dims = 77ms per query, far above real-time budgets.
2Interview Tip: Describe the recall vs latency tradeoff and how you would tune for a specific application requirement.
3Interview Tip: Compare the three ANN approaches and when each is preferred (memory-constrained, latency-critical, etc.).
← Back to Approximate Nearest Neighbors (FAISS, ScaNN, HNSW) Overview