Memory vs Disk Trade-offs: When Data Exceeds RAM
MEMORY VS LATENCY
HNSW achieves sub-millisecond latency but requires all vectors in RAM. IVF-PQ compresses vectors 50 to 100x but queries take 10 to 50ms. For real time recommendations requiring under 10ms response, HNSW is the only option. For batch processing or less latency sensitive applications, IVF-PQ saves significant infrastructure cost.
ACCURACY VS SPEED
Every ANN algorithm has tunable parameters that trade accuracy for speed. HNSW ef parameter: ef=50 gives 95% recall at 1ms, ef=500 gives 99.5% recall at 5ms. IVF nprobe parameter: checking 10 clusters gives 90% recall, checking 100 clusters gives 98% recall at 10x latency. Choose based on your accuracy requirements and latency budget.
BUILD TIME VS QUERY TIME
Some indexes are fast to build but slow to query; others are slow to build but fast to query. HNSW takes hours to build for 100M vectors but queries in under 1ms. A flat index (no structure) builds instantly but queries in seconds. If your index changes frequently (new products hourly), build time matters. If it changes rarely, optimize for query speed.
DISK VS RAM
When vectors exceed RAM, options are: disk based indexes (IVF with memory mapped files), smaller compressed representations (PQ), or sharding across machines. Disk adds 1 to 10ms latency per page fault. Memory mapping helps with sequential access but random access to large indexes still hits disk. Plan for data growth: if you have 100M vectors today and expect 1B in a year, design for disk from the start.