Recommendation SystemsScalability (ANN, HNSW, FAISS)Hard⏱️ ~3 min

Memory vs Disk Trade-offs: When Data Exceeds RAM

MEMORY VS LATENCY

HNSW achieves sub-millisecond latency but requires all vectors in RAM. IVF-PQ compresses vectors 50 to 100x but queries take 10 to 50ms. For real time recommendations requiring under 10ms response, HNSW is the only option. For batch processing or less latency sensitive applications, IVF-PQ saves significant infrastructure cost.

ACCURACY VS SPEED

Every ANN algorithm has tunable parameters that trade accuracy for speed. HNSW ef parameter: ef=50 gives 95% recall at 1ms, ef=500 gives 99.5% recall at 5ms. IVF nprobe parameter: checking 10 clusters gives 90% recall, checking 100 clusters gives 98% recall at 10x latency. Choose based on your accuracy requirements and latency budget.

🎯 Decision Framework: Under 10ms + fits in RAM → HNSW. Under 50ms + needs compression → IVF-PQ. Billions of vectors → sharded IVF-PQ or ScaNN.

BUILD TIME VS QUERY TIME

Some indexes are fast to build but slow to query; others are slow to build but fast to query. HNSW takes hours to build for 100M vectors but queries in under 1ms. A flat index (no structure) builds instantly but queries in seconds. If your index changes frequently (new products hourly), build time matters. If it changes rarely, optimize for query speed.

DISK VS RAM

When vectors exceed RAM, options are: disk based indexes (IVF with memory mapped files), smaller compressed representations (PQ), or sharding across machines. Disk adds 1 to 10ms latency per page fault. Memory mapping helps with sequential access but random access to large indexes still hits disk. Plan for data growth: if you have 100M vectors today and expect 1B in a year, design for disk from the start.

💡 Key Takeaways
HNSW: sub-ms latency, requires all vectors in RAM. IVF-PQ: 10-50ms, 50-100x compression
Accuracy tuning: HNSW ef=50 → 95% recall at 1ms, ef=500 → 99.5% at 5ms
IVF nprobe: 10 clusters → 90% recall, 100 clusters → 98% recall at 10x latency
Build time matters for frequently changing indexes; HNSW takes hours for 100M vectors
Disk adds 1-10ms per page fault; design for disk from start if expecting 10x growth
📌 Interview Tips
1Provide decision framework: real-time + fits RAM → HNSW; batch + compression → IVF-PQ
2Discuss parameter tuning: start with default (ef=50), increase if recall too low
3Plan for growth: 100M now, 1B in a year means design for disk-based from start
← Back to Scalability (ANN, HNSW, FAISS) Overview