Embeddings & Similarity SearchApproximate Nearest Neighbors (FAISS, ScaNN, HNSW)Hard⏱️ ~2 min

ScaNN: Learning Based Quantization for Maximum Inner Product Search

ScaNN (Scalable Nearest Neighbors) from Google Research optimizes both partitioning and quantization specifically for Maximum Inner Product Search (MIPS). Unlike FAISS IVF PQ which uses k means clustering and generic quantization, ScaNN learns an anisotropic partitioning function and asymmetric quantization that maximize inner product preservation. The algorithm trains a neural network to map queries and database vectors into a space where inner product is easier to approximate, then applies learned quantizers that minimize distortion for the specific task. ScaNN supports asymmetric distance computation where queries remain in full precision and only database vectors are quantized. This reduces quantization error for the query side while keeping memory low for the database. The method also allows score aware pruning, where the algorithm computes upper bounds on inner products and skips partitions that cannot contain top k results. These optimizations yield roughly 2 times speedup over HNSW at the same recall for inner product heavy workloads. Google Research benchmarks show 1.5 milliseconds median latency at 95 percent recall on 100 million vectors compared to 3 milliseconds for HNSW. The tradeoff is training complexity and update cost. ScaNN requires a representative sample for learning partitions and codebooks. Training can take hours to days depending on dataset size. Incremental updates are harder than HNSW because the learned structures assume a fixed distribution. In practice, operators retrain indexes weekly or monthly and use blue green swaps. ScaNN is ideal for MIPS workloads like ads retrieval where query and item embeddings have different semantics, and the inner product directly models relevance. Google uses ScaNN in production for ads and recommendation systems where inner product scoring is critical.
💡 Key Takeaways
ScaNN learns anisotropic partitioning and asymmetric quantization optimized for MIPS, achieving 2 times speedup over HNSW at fixed recall for inner product tasks
Asymmetric distance keeps queries in full precision and quantizes only database vectors, reducing quantization error and improving recall by 1 to 3 percent
Benchmark results: 1.5 milliseconds median at 95 percent recall on 100 million vectors on CPU, versus 3 milliseconds for HNSW and 4 milliseconds for FAISS IVF PQ
Training cost is substantial: learning partitions and codebooks on 100 million vectors takes 4 to 12 hours on multi core machines, requires representative sample
Incremental updates are difficult because learned structures assume fixed distribution; production systems retrain weekly or monthly with blue green index swaps
Score aware pruning computes upper bounds on inner products to skip partitions, further reducing query time by 20 to 40 percent on skewed distributions
📌 Examples
Google Ads retrieval: ScaNN on CPU for MIPS between user query embeddings and ad embeddings, 100 million ads, 1.5 milliseconds median latency at 96 percent recall, serves 10000 QPS per machine
Training pipeline: sample 10 million vectors from 500 million corpus, train ScaNN partitioner and quantizer for 8 hours on 64 core machine, deploy to 20 replica shards with blue green cutover
MIPS use case: user embedding dimension 256, ad embedding dimension 256, inner product measures relevance, ScaNN quantizes ads to 12 bytes per vector, keeps user query in float32
← Back to Approximate Nearest Neighbors (FAISS, ScaNN, HNSW) Overview