Embeddings & Similarity SearchApproximate Nearest Neighbors (FAISS, ScaNN, HNSW)Medium⏱️ ~2 min

HNSW: Graph Based Proximity Search

Hierarchical Navigable Small World (HNSW) builds a multi layer graph where each node is a vector and edges connect nearby neighbors. The algorithm creates a hierarchy similar to skip lists. The top layer has sparse long range connections for fast navigation. Lower layers have denser local connections for precision. Search starts at the entry point in the highest layer and greedily walks to the closest neighbor at each step, descending through layers until reaching the base layer where final candidates are collected. The construction parameter M controls neighbors per node. Typical values are 16 to 32. Higher M improves recall but increases memory and build time. The parameter efConstruction controls search effort during index building. Values like 200 to 400 produce high quality graphs. At query time, efSearch controls candidate exploration. Setting efSearch to 100 might give 95 percent recall, while 300 gives 99 percent recall at 3 times the latency. Memory overhead is substantial: each vector stores full precision floats plus roughly 100 to 300 bytes for graph pointers and metadata. HNSW shines for datasets up to 50 million vectors on CPU with moderate update rates. It supports online insertions with good recall and requires no training phase. Spotify and other streaming services use HNSW for real time music and podcast recommendations. The main limitation is memory cost. At 200 million vectors with 768 float32 dimensions, expect 614 GB for vectors plus another 200 to 400 GB for graph structure, totaling roughly 1 TB. This requires memory optimized instances and becomes expensive at billion scale compared to quantized methods.
💡 Key Takeaways
HNSW uses M neighbors per node, typically 16 to 32; higher M improves recall but increases memory by 50 to 100 percent and slows insertions
Query parameter efSearch trades latency for recall: efSearch=100 gives 95 percent recall in 3 milliseconds, efSearch=300 gives 99 percent recall in 10 milliseconds on 50 million vectors
Memory footprint: full precision vectors plus 100 to 300 bytes per vector for graph, totaling roughly 2 times raw vector size; at 200 million 768d vectors this is 1 TB
Build time is substantial: 50 million vectors with efConstruction=200 takes multiple hours on a 32 core machine; use blue green deployments to avoid downtime
Supports online insertions with maintained recall, but deletions are lazy tombstones; periodic compaction needed to remove dead links and maintain query speed
Works best up to 50 million vectors on CPU; beyond that, memory cost and instance size make quantized methods like FAISS IVF PQ more economical
📌 Examples
Spotify playlist continuation: 100 million track embeddings, HNSW with M=24, efSearch=150, 6 milliseconds median latency, 97 percent recall, runs on r5.12xlarge instances with 384 GB RAM
Online insertion pattern: new music releases added in real time, HNSW maintains 96 percent recall without full rebuilds, insertions take 5 to 15 milliseconds per vector
Deletion handling: tombstone deleted tracks, rebuild index monthly to compact graph and remove 20 percent overhead from dead nodes, reduces p99 latency from 25 ms back to 18 ms
← Back to Approximate Nearest Neighbors (FAISS, ScaNN, HNSW) Overview