Learn→ML-Powered Search & Ranking→Scalability (Sharding, Caching, Approximate Search)→3 of 6

ML-Powered Search & Ranking • Scalability (Sharding, Caching, Approximate Search)Medium⏱️ ~3 min

Approximate Nearest Neighbor: HNSW vs IVF-PQ

Definition
Approximate Nearest Neighbor (ANN) finds similar vectors without comparing every item—sacrificing some accuracy (95% recall) for speed (1ms vs 100ms+ for exact search).
WHY APPROXIMATE SEARCH
Exact search compares query against every vector. At 1B vectors, 1μs per comparison = 1000 seconds. ANN algorithms build index structures eliminating most comparisons. HNSW and IVF-PQ are the two dominant approaches with different trade-offs.
HNSW: GRAPH-BASED SEARCH
HNSW builds a multi-layer graph. Top layers have few nodes with long-range connections; bottom has all nodes with local connections. Search navigates top to bottom. Query: 1-10ms at billion scale. Memory: full vectors (1-4KB each). Best for: low latency, high recall.
💡 Key Insight: HNSW has best latency-recall trade-off but needs full vectors in memory. At 1B × 1KB = 1TB RAM. For memory-constrained systems, use IVF-PQ.
IVF-PQ: QUANTIZED SEARCH
IVF clusters vectors, searches relevant clusters only. PQ compresses vectors from 1KB to 64 bytes. Combined: search subset of compressed vectors. Query: 5-50ms. Memory: 16-64x less than HNSW. Best for: memory efficiency, acceptable latency trade-off.
CHOOSING BETWEEN THEM
HNSW: latency critical (<5ms), memory available, recall >98%. IVF-PQ: memory constrained, 20-50ms acceptable, 90-95% recall OK. Hybrid: HNSW for hot data, IVF-PQ for cold.
⚠️ Key Trade-off: HNSW: fast + accurate but expensive (memory). IVF-PQ: cheap but slower + less accurate. Choose based on constraints.

💡 Key Takeaways

✓HNSW: graph-based, 1-10ms, needs full vectors in memory (1TB for 1B)

✓IVF-PQ: quantized, 5-50ms, 16-64x less memory than HNSW

✓HNSW for latency-critical; IVF-PQ for memory-constrained

📌 Interview Tips

1Compare HNSW vs IVF-PQ with memory and latency numbers

2Suggest hybrid: HNSW for hot data, IVF-PQ for cold

← Back to Scalability (Sharding, Caching, Approximate Search) Overview