FAISS IVF-PQ: Inverted File with Product Quantization
WHAT IS IVF-PQ
IVF-PQ (Inverted File with Product Quantization) combines two techniques: coarse partitioning to reduce search scope, and vector compression to reduce memory and speed up distance computation. It is the workhorse of FAISS and the go-to choice when you need to index billions of vectors on limited hardware.
The algorithm works in two stages. First, k-means clustering partitions vectors into coarse cells (typically 1K-10K cells). At query time, only the nearest cells are searched. Second, Product Quantization compresses each vector from full precision (768 dims × 4 bytes = 3KB) to a compact code (64 bytes), reducing memory by 50x.
INVERTED FILE (IVF) PARTITIONING
Training clusters N vectors into K centroids using k-means. Each vector is assigned to its nearest centroid. At query time, find the nprobe nearest centroids to the query, then search only within those partitions.
Example: 1M vectors, 1000 centroids, nprobe=20. Instead of checking 1M vectors, check only 20K (2% of data). If vectors are well-clustered, the true neighbors are likely in those 20 cells. Recall@10 of 0.95 is achievable with nprobe=20-50 on well-distributed data.
PRODUCT QUANTIZATION (PQ)
PQ splits each vector into M subvectors (typically M=8-32), then quantizes each subvector independently using a codebook of 256 centroids. A 768-dim vector becomes M byte codes—one byte per subvector pointing to its nearest centroid.
Distance computation uses lookup tables. Pre-compute distances from query subvectors to all 256 centroids in each codebook. Distance to a database vector = sum of M table lookups. This replaces 768 floating-point operations with M integer additions.
Memory: 768-dim vector compressed from 3072 bytes to M bytes (32 bytes with M=32). 1 billion vectors fits in 32GB instead of 3TB.
KEY PARAMETERS
nlist: Number of IVF cells. More cells = finer partitioning = lower recall per probe but faster per-cell search. Typical: sqrt(N) to 4×sqrt(N). For 1M vectors: 1000-4000.
nprobe: Cells to search at query time. Higher = better recall, more latency. Start at 1% of nlist, tune up.
M (PQ segments): More segments = better accuracy but larger codes. Common: 8, 16, 32, 64.