Approximate Nearest Neighbor Search: Trading Exactness for Scale

Definition
Approximate Nearest Neighbor (ANN) search finds the most similar items to a query by trading a small amount of accuracy for massive speed gains. Instead of comparing against every item (exact search), it uses clever data structures to find 95% or more of the true nearest neighbors in milliseconds.
THE PROBLEM WITH EXACT SEARCH
Each item is represented as a vector (a list of numbers, typically 64 to 256 dimensions). To find similar items, compute the distance between the query vector and every item vector. With 100 million items and 128 dimensions, that is 12.8 billion floating point operations per query. At 10 GFLOPS per core, one query takes 1.3 seconds. Users expect results in 50 milliseconds. Exact search is fundamentally too slow at scale.
HOW ANN WORKS
ANN algorithms build index structures during an offline phase. At query time, they navigate these structures to quickly find candidate regions likely to contain nearest neighbors. Instead of checking 100 million items, they check perhaps 10,000. The index structure trades preprocessing time and memory for query speed.
💡 Key Insight: ANN is not a single algorithm. It is a family of techniques: tree based, graph based, hash based, and quantization based. Each has different trade-offs for memory, speed, and accuracy.
RECALL AS THE QUALITY METRIC
ANN quality is measured by recall at k: what fraction of the true k nearest neighbors does the algorithm find? Recall of 0.95 means 95 of 100 true neighbors are returned. Most applications tolerate 95% recall for 100x speed improvement.

💡 Key Takeaways

✓Exact search over 100M 128-dim vectors takes 1.3 seconds per query - fundamentally too slow

✓ANN trades accuracy (finding 95% of true neighbors) for speed (milliseconds instead of seconds)

✓Index structures precompute organization offline, enabling fast query-time navigation

✓Recall@k measures quality: 0.95 means 95 of 100 true neighbors found, acceptable for most uses

✓ANN families: tree-based, graph-based, hash-based, quantization-based with different trade-offs

📌 Interview Tips

1Walk through the math: 100M items × 128 dims = 12.8B FLOPs = 1.3 seconds at 10 GFLOPS

2Explain recall: returning 95 of 100 true neighbors is usually acceptable for recommendations

3Discuss the trade-off: 100x speedup for 5% accuracy loss is a worthwhile trade in most systems

← Back to Scalability (ANN, HNSW, FAISS) Overview