Embeddings & Similarity SearchApproximate Nearest Neighbors (FAISS, ScaNN, HNSW)Medium⏱️ ~2 min

FAISS IVF-PQ: Inverted File with Product Quantization

WHAT IS IVF-PQ

IVF-PQ (Inverted File with Product Quantization) combines two techniques: coarse partitioning to reduce search scope, and vector compression to reduce memory and speed up distance computation. It is the workhorse of FAISS and the go-to choice when you need to index billions of vectors on limited hardware.

The algorithm works in two stages. First, k-means clustering partitions vectors into coarse cells (typically 1K-10K cells). At query time, only the nearest cells are searched. Second, Product Quantization compresses each vector from full precision (768 dims × 4 bytes = 3KB) to a compact code (64 bytes), reducing memory by 50x.

INVERTED FILE (IVF) PARTITIONING

Training clusters N vectors into K centroids using k-means. Each vector is assigned to its nearest centroid. At query time, find the nprobe nearest centroids to the query, then search only within those partitions.

Example: 1M vectors, 1000 centroids, nprobe=20. Instead of checking 1M vectors, check only 20K (2% of data). If vectors are well-clustered, the true neighbors are likely in those 20 cells. Recall@10 of 0.95 is achievable with nprobe=20-50 on well-distributed data.

PRODUCT QUANTIZATION (PQ)

PQ splits each vector into M subvectors (typically M=8-32), then quantizes each subvector independently using a codebook of 256 centroids. A 768-dim vector becomes M byte codes—one byte per subvector pointing to its nearest centroid.

Distance computation uses lookup tables. Pre-compute distances from query subvectors to all 256 centroids in each codebook. Distance to a database vector = sum of M table lookups. This replaces 768 floating-point operations with M integer additions.

Memory: 768-dim vector compressed from 3072 bytes to M bytes (32 bytes with M=32). 1 billion vectors fits in 32GB instead of 3TB.

KEY PARAMETERS

nlist: Number of IVF cells. More cells = finer partitioning = lower recall per probe but faster per-cell search. Typical: sqrt(N) to 4×sqrt(N). For 1M vectors: 1000-4000.

nprobe: Cells to search at query time. Higher = better recall, more latency. Start at 1% of nlist, tune up.

M (PQ segments): More segments = better accuracy but larger codes. Common: 8, 16, 32, 64.

💡 Key Insight: IVF-PQ trades recall for memory efficiency. Use when you need to index billions of vectors on commodity hardware. For highest recall, use HNSW. For lowest memory, use IVF-PQ with aggressive quantization.
💡 Key Takeaways
IVF partitions vectors into clusters; only nearest clusters searched at query time
PQ compresses 768-dim vectors from 3KB to 32-64 bytes using subvector codebooks
Distance computation via lookup tables: M additions instead of 768 FLOPs
Key knobs: nlist (partitions), nprobe (search breadth), M (compression level)
📌 Interview Tips
1Interview Tip: Walk through the numbers—1M vectors, 1000 cells, nprobe=20 means checking 2% of data for 95% recall.
2Interview Tip: Explain PQ compression math—768 dims × 4 bytes = 3KB compressed to M bytes with codebook lookups.
← Back to Approximate Nearest Neighbors (FAISS, ScaNN, HNSW) Overview