Recommendation SystemsScalability (ANN, HNSW, FAISS)Hard⏱️ ~3 min

Choosing the Right Index: Decision Framework and Capacity Planning

CHOOSING AN INDEX TYPE

Under 1 million vectors: Flat index (exact search) may be fast enough. Benchmark before adding complexity. 1 to 100 million vectors: HNSW if latency critical (under 10ms) and vectors fit in RAM. 100 million to 1 billion: IVF-PQ for memory efficiency or sharded HNSW across multiple machines. Over 1 billion: Distributed solutions with sharding, typically IVF-PQ with disk storage.

CAPACITY PLANNING

Calculate memory needs before choosing infrastructure. HNSW: vectors + graph = approximately 1.5x raw vector size. For 100M 128 dim vectors: 100M × 128 × 4 bytes × 1.5 = 77 GB RAM needed per replica. IVF-PQ: roughly 8 to 16 bytes per vector. 100M vectors = 0.8 to 1.6 GB. Add replicas for query throughput: if one replica handles 100 QPS and you need 500 QPS, deploy 5 replicas.

✅ Best Practice: Always benchmark with your actual data. Published benchmarks use synthetic data with uniform distribution. Real data has clusters and outliers that change performance significantly.

WHEN TO SHARD

Shard when single machine cannot hold the index or cannot serve required throughput. Sharding strategies: by vector ID range (simple but hot spots possible), by cluster (IVF clusters map to shards), or random (uniform load but complex routing). Sharding adds coordination overhead: query fans out to all shards, results merge. Expect 2 to 5ms overhead per shard added.

LIBRARIES AND TOOLS

FAISS provides IVF, PQ, and flat indexes with GPU acceleration. Hnswlib is a lightweight HNSW implementation. ScaNN optimizes for modern CPUs with SIMD instructions. Milvus and Pinecone provide managed services with sharding and replication built in. Start with FAISS for prototyping, move to managed services for production if operational complexity is a concern.

💡 Key Takeaways
Under 1M: flat index may suffice. 1-100M: HNSW if fits in RAM. 100M-1B: IVF-PQ. Over 1B: sharded
HNSW memory: 1.5x raw vector size. 100M × 128 × 4 bytes × 1.5 = 77 GB per replica
IVF-PQ memory: 8-16 bytes per vector. 100M vectors = 0.8-1.6 GB per replica
Sharding adds 2-5ms overhead per shard; shard by cluster for natural load distribution
FAISS for prototyping, managed services (Milvus, Pinecone) for production operational simplicity
📌 Interview Tips
1Walk through capacity: 500 QPS target, 100 QPS per replica → 5 replicas needed
2Explain sharding decision: single 77 GB replica vs two 40 GB shards with fan-out overhead
3Discuss benchmarking: always test with real data, not synthetic uniform distribution
← Back to Scalability (ANN, HNSW, FAISS) Overview
Choosing the Right Index: Decision Framework and Capacity Planning | Scalability (ANN, HNSW, FAISS) - System Overflow