HNSW: Graph Based Search with Hierarchical Navigation

WHAT IS HNSW
Hierarchical Navigable Small World (HNSW) is a graph based ANN algorithm. Each vector becomes a node in a graph, connected to its nearest neighbors. To find neighbors for a query, you start at a random node and greedily move to the neighbor closest to the query, repeating until you reach a local minimum. The hierarchical part adds skip connections: upper layers have fewer nodes but longer jumps, like express trains that skip local stops.
HOW THE GRAPH IS BUILT
Insert each vector one at a time. For each new vector, find its nearest neighbors in the existing graph and connect to them. The number of connections per node (M parameter) controls graph density: M=16 means each node connects to 16 neighbors. Higher M increases recall but uses more memory. The construction algorithm assigns nodes to layers probabilistically, with most nodes only in the bottom layer and few in upper layers.
QUERY ALGORITHM
Start at a fixed entry point in the top layer. Greedily descend: at each layer, find the closest node to the query by checking neighbors, then move to the next layer down from that node. At the bottom layer, do a more thorough search by checking ef (expansion factor) candidates. Higher ef increases recall but slows queries. Typical values: ef=50 for 95% recall, ef=200 for 99% recall.
⚠️ Trade-off: HNSW offers excellent query speed (sub-millisecond for millions of vectors) but requires all vectors in RAM. Memory usage is high: 128 dim × 4 bytes × 100M vectors = 51 GB just for vectors, plus graph structure.
WHEN TO USE HNSW
HNSW is best when you need very low latency (under 5ms), data fits in RAM (up to a few hundred million vectors), and you can afford the memory cost. It is the default choice for most recommendation and semantic search systems at moderate scale.

💡 Key Takeaways

✓Graph-based: each vector is a node connected to M nearest neighbors (typically M=16)

✓Hierarchical layers: upper layers have fewer nodes but longer jumps for fast initial navigation

✓Query: start at top layer entry point, greedily descend, thorough search at bottom with ef candidates

✓ef=50 gives 95% recall, ef=200 gives 99% recall; higher ef means slower but more accurate

✓Memory intensive: 100M 128-dim vectors = 51 GB just for vectors, plus graph overhead

📌 Interview Tips

1Explain the hierarchy: express train (top layers) to local stops (bottom layer)

2Walk through a query: entry point → greedy descent → thorough bottom-layer search

3Discuss memory: 100M vectors × 128 dims × 4 bytes = 51 GB minimum

← Back to Scalability (ANN, HNSW, FAISS) Overview