Learn→Embeddings & Similarity Search→Approximate Nearest Neighbors (FAISS, ScaNN, HNSW)→3 of 6

Embeddings & Similarity Search • Approximate Nearest Neighbors (FAISS, ScaNN, HNSW)Medium⏱️ ~2 min

FAISS IVF-PQ: Inverted File with Product Quantization

WHAT IS IVF-PQ
IVF-PQ (Inverted File with Product Quantization) combines two techniques: coarse partitioning to reduce search scope, and vector compression to reduce memory and speed up distance computation. It is the workhorse of FAISS and the go-to choice when you need to index billions of vectors on limited hardware.
The algorithm works in two stages. First, k-means clustering partitions vectors into coarse cells (typically 1K-10K cells). At query time, only the nearest cells are searched. Second, Product Quantization compresses each vector from full precision (768 dims × 4 bytes = 3KB) to a compact code (64 bytes), reducing memory by 50x.
INVERTED FILE (IVF) PARTITIONING
Training clusters N vectors into K centroids using k-means. Each vector is assigned to its nearest centroid. At query time, find the nprobe nearest centroids to the query, then search only within those partitions.
Example: 1M vectors, 1000 centroids, nprobe=20. Instead of checking 1M vectors, check only 20K (2% of data). If vectors are well-clustered, the true neighbors are likely in those 20 cells. Recall@10 of 0.95 is achievable with nprobe=20-50 on well-distributed data.
PRODUCT QUANTIZATION (PQ)
PQ splits each vector into M subvectors (typically M=8-32), then quantizes each subvector independently using a codebook of 256 centroids. A 768-dim vector becomes M byte codes—one byte per subvector pointing to its nearest centroid.
Distance computation uses lookup tables. Pre-compute distances from query subvectors to all 256 centroids in each codebook. Distance to a database vector = sum of M table lookups. This replaces 768 floating-point operations with M integer additions.
Memory: 768-dim vector compressed from 3072 bytes to M bytes (32 bytes with M=32). 1 billion vectors fits in 32GB instead of 3TB.
KEY PARAMETERS
nlist: Number of IVF cells. More cells = finer partitioning = lower recall per probe but faster per-cell search. Typical: sqrt(N) to 4×sqrt(N). For 1M vectors: 1000-4000.
nprobe: Cells to search at query time. Higher = better recall, more latency. Start at 1% of nlist, tune up.
M (PQ segments): More segments = better accuracy but larger codes. Common: 8, 16, 32, 64.
💡 Key Insight: IVF-PQ trades recall for memory efficiency. Use when you need to index billions of vectors on commodity hardware. For highest recall, use HNSW. For lowest memory, use IVF-PQ with aggressive quantization.

💡 Key Takeaways

✓IVF partitions vectors into clusters; only nearest clusters searched at query time

✓PQ compresses 768-dim vectors from 3KB to 32-64 bytes using subvector codebooks

✓Distance computation via lookup tables: M additions instead of 768 FLOPs

✓Key knobs: nlist (partitions), nprobe (search breadth), M (compression level)

📌 Interview Tips

1Interview Tip: Walk through the numbers—1M vectors, 1000 cells, nprobe=20 means checking 2% of data for 95% recall.

2Interview Tip: Explain PQ compression math—768 dims × 4 bytes = 3KB compressed to M bytes with codebook lookups.

← Back to Approximate Nearest Neighbors (FAISS, ScaNN, HNSW) Overview