IVF and Product Quantization: Compression for Billion Scale Search
INVERTED FILE INDEX
IVF (Inverted File) divides the vector space into clusters. Train k-means on a sample of vectors to create cluster centroids (typically 1,000 to 10,000 clusters for 100M vectors). Each vector is assigned to its nearest centroid. At query time, find the closest centroids to the query and only search vectors in those clusters. Searching 10 clusters out of 1,000 reduces comparisons by 100x.
PRODUCT QUANTIZATION
Product Quantization (PQ) compresses vectors to reduce memory. Split each 128 dimension vector into 8 subvectors of 16 dimensions each. For each subvector position, learn 256 representative codes. Now represent any vector as 8 bytes (8 positions × 1 byte index into 256 codes) instead of 512 bytes (128 × 4 byte floats). This is 64x compression. Distance computation uses precomputed tables, staying fast despite compression.
IVF-PQ: COMBINING BOTH
The combination is powerful: IVF reduces the number of vectors to check, PQ reduces memory per vector. 100M vectors with 128 dimensions: raw storage is 51 GB. With IVF-PQ, storage drops to under 1 GB. Query latency increases slightly (10-30ms) compared to HNSW (under 5ms), but memory efficiency makes billion scale search feasible.
ACCURACY LOSS
PQ introduces quantization error: the compressed vector is not identical to the original. Recall drops 2 to 5% compared to exact search. For applications requiring high precision (duplicate detection, exact matching), this may be unacceptable. For recommendations where approximate results suffice, it works well.