Embeddings & Similarity SearchApproximate Nearest Neighbors (FAISS, ScaNN, HNSW)Easy⏱️ ~2 min

What is Approximate Nearest Neighbor (ANN) Search?

Approximate Nearest Neighbor (ANN) search retrieves vectors that are close enough to a query in a high dimensional space, trading small accuracy losses for massive speed and memory gains. Modern machine learning systems convert items like products, songs, or documents into dense embedding vectors with 128 to 768 dimensions. Finding the nearest neighbors requires measuring similarity using cosine similarity, L2 distance, or inner product. Exact search scales linearly with dataset size and dimensions. At 100 million vectors with 768 dimensions, brute force requires billions of floating point operations per query. This approach cannot meet production latency budgets like 20 milliseconds p99. ANN algorithms solve this by reducing the search space through indexing structures and quantization, turning linear scans into work proportional to log N or a small number of probed partitions. The core tradeoff is recall versus speed. ANN systems typically target 95 to 99 percent recall, meaning they find 95 to 99 of the true 100 nearest neighbors. Meta uses FAISS for billion scale duplicate detection. Google uses ScaNN (Scalable Nearest Neighbors) for ads and recommendation retrieval. Spotify uses Hierarchical Navigable Small World (HNSW) graphs for music similarity. Pinterest processes tens of billions of vectors with GPU accelerated FAISS to meet tight latency budgets.
💡 Key Takeaways
ANN trades 1 to 5 percent recall for 100 to 10000 times speedup compared to exact search on large datasets
Production systems target 95 to 99 percent recall at p99 latency under 20 milliseconds for billion scale retrieval
Three dominant algorithm families: graph based navigation like HNSW, inverted file plus quantization like FAISS IVF Product Quantization (PQ), and learning based quantizers like ScaNN
Real deployments: Meta FAISS for content deduplication, Google ScaNN for ads, Spotify HNSW for music similarity, Pinterest GPU FAISS for visual search
Distance metrics matter: cosine similarity for normalized embeddings, L2 for Euclidean distance, inner product for maximum inner product search (MIPS) tasks
At 200 million vectors with 768 float32 dimensions, raw storage is 614 GB; quantization can compress this to 3.2 GB with acceptable recall loss
📌 Examples
Spotify music similarity: 100 million song embeddings, HNSW with M=16, efSearch=200, returns top 500 candidates in 8 milliseconds median, 18 milliseconds p99 on CPU
Pinterest visual search: 30 billion image embeddings, FAISS IVF PQ on GPU, probes 32 lists, achieves 10000 queries per second at 96 percent recall with 5 milliseconds per batch
Google ads retrieval: ScaNN on CPU for inner product search, 1.5 milliseconds median at 95 percent recall on 100 million ad embeddings, 2 times faster than HNSW at same recall
← Back to Approximate Nearest Neighbors (FAISS, ScaNN, HNSW) Overview
What is Approximate Nearest Neighbor (ANN) Search? | Approximate Nearest Neighbors (FAISS, ScaNN, HNSW) - System Overflow