Recommendation SystemsScalability (ANN, HNSW, FAISS)Medium⏱️ ~2 min

HNSW: Graph Based Search with Hierarchical Navigation

Hierarchical Navigable Small World (HNSW) builds a multi layer graph structure where each vector is a node and edges connect similar vectors. The key innovation is the hierarchical layers: top layers are sparse with long range connections for coarse navigation, while bottom layers are dense with fine grained local connections. Search starts at the top, greedily follows edges toward the query, then descends through layers, refining the route until reaching the bottom dense layer. This approach delivers near logarithmic search complexity in practice and excellent latency when the entire graph fits in memory. A typical configuration uses M equals 16 to 32 bidirectional edges per node, with efConstruction equals 200 during build (controlling build quality) and efSearch equals 50 to 200 at query time (controlling recall versus latency). Search follows the best edge at each step, achieving very stable sub 10 millisecond latencies for in memory workloads. The critical limitation is memory overhead. Beyond the raw embedding storage, HNSW adds 2 to 5 times overhead for graph metadata (edge pointers, layer assignments) at moderate connectivity. At 100 million vectors, this can mean 600 GB to 1.5 TB total memory requirement. In the Deep1B benchmark on a 64 GB RAM host, HNSW implementations (nmslib, N2) failed during index construction on approximately 54 million vectors after hours of preprocessing, running out of memory before completing the build. HNSW supports dynamic insertions, which is valuable for applications with frequent updates like real time content indexing. However, sustained heavy writes can fragment memory and degrade graph connectivity over time. Production systems typically schedule periodic rebuilds (weekly or monthly) or use shadow indexing: build a fresh index in the background, replay recent writes, then atomically swap. This keeps search quality high while maintaining update freshness.
💡 Key Takeaways
Memory overhead is 2 to 5 times raw embedding size at moderate M values (16 to 32 edges per node), making HNSW challenging beyond tens of millions of vectors on constrained hardware
Search latency is very stable at sub 10 milliseconds when fully in memory, but pointer chasing maps poorly to disk due to random access patterns that cause latency to balloon
HNSW failed to complete index construction under 64 GB RAM on approximately 54 million vectors in the Deep1B benchmark, demonstrating the memory scaling challenge at billion scale
Dynamic insertions are supported natively, but sustained heavy writes degrade graph quality over time, requiring periodic rebuilds or background consolidation to restore optimal connectivity
Best fit for applications needing low latency high recall retrieval on datasets that fit in available memory, or when integrated with search engines like OpenSearch for hybrid keyword plus vector search
📌 Examples
OpenSearch and Elasticsearch integrate HNSW for hybrid text plus vector search, leveraging existing inverted index infrastructure while adding dense retrieval capabilities under 20 millisecond latencies
Typical HNSW configuration: M equals 16, efConstruction equals 200, efSearch equals 100, delivering 0.95 recall at approximately 5 milliseconds per query on 10 million vectors in memory
A 100 million vector HNSW index with 768 dimensional embeddings at M equals 32 requires approximately 900 GB total memory (300 GB embeddings plus 600 GB graph overhead), exceeding most single machine RAM budgets
← Back to Scalability (ANN, HNSW, FAISS) Overview
HNSW: Graph Based Search with Hierarchical Navigation | Scalability (ANN, HNSW, FAISS) - System Overflow