ML-Powered Search & RankingScalability (Sharding, Caching, Approximate Search)Medium⏱️ ~3 min

Approximate Nearest Neighbor: HNSW vs IVF-PQ

Definition
Approximate Nearest Neighbor (ANN) finds similar vectors without comparing every item—sacrificing some accuracy (95% recall) for speed (1ms vs 100ms+ for exact search).

WHY APPROXIMATE SEARCH

Exact search compares query against every vector. At 1B vectors, 1μs per comparison = 1000 seconds. ANN algorithms build index structures eliminating most comparisons. HNSW and IVF-PQ are the two dominant approaches with different trade-offs.

HNSW: GRAPH-BASED SEARCH

HNSW builds a multi-layer graph. Top layers have few nodes with long-range connections; bottom has all nodes with local connections. Search navigates top to bottom. Query: 1-10ms at billion scale. Memory: full vectors (1-4KB each). Best for: low latency, high recall.

💡 Key Insight: HNSW has best latency-recall trade-off but needs full vectors in memory. At 1B × 1KB = 1TB RAM. For memory-constrained systems, use IVF-PQ.

IVF-PQ: QUANTIZED SEARCH

IVF clusters vectors, searches relevant clusters only. PQ compresses vectors from 1KB to 64 bytes. Combined: search subset of compressed vectors. Query: 5-50ms. Memory: 16-64x less than HNSW. Best for: memory efficiency, acceptable latency trade-off.

CHOOSING BETWEEN THEM

HNSW: latency critical (<5ms), memory available, recall >98%. IVF-PQ: memory constrained, 20-50ms acceptable, 90-95% recall OK. Hybrid: HNSW for hot data, IVF-PQ for cold.

⚠️ Key Trade-off: HNSW: fast + accurate but expensive (memory). IVF-PQ: cheap but slower + less accurate. Choose based on constraints.
💡 Key Takeaways
HNSW: graph-based, 1-10ms, needs full vectors in memory (1TB for 1B)
IVF-PQ: quantized, 5-50ms, 16-64x less memory than HNSW
HNSW for latency-critical; IVF-PQ for memory-constrained
📌 Interview Tips
1Compare HNSW vs IVF-PQ with memory and latency numbers
2Suggest hybrid: HNSW for hot data, IVF-PQ for cold
← Back to Scalability (Sharding, Caching, Approximate Search) Overview