Learn→Embeddings & Similarity Search→Dimensionality Reduction (PCA, UMAP)→2 of 6

Embeddings & Similarity Search • Dimensionality Reduction (PCA, UMAP)Medium⏱️ ~2 min

Principal Component Analysis (PCA) for Online Systems

HOW PCA WORKS
Principal Component Analysis finds directions in the data that capture the most variance. Imagine a cloud of points in 3D space that is flat like a pancake—most of the spread is in two directions. PCA finds those two directions (principal components) and projects points onto them, reducing from 3D to 2D with minimal information loss.
Mathematically, PCA computes the covariance matrix of the data, then finds its eigenvectors. The eigenvector with the largest eigenvalue is the first principal component—the direction of maximum variance. The second component is perpendicular to the first and captures the next most variance, and so on. You keep the top k components and discard the rest.
COMPUTATIONAL COST
Computing full PCA on N vectors of dimension D costs O(N × D²) for the covariance matrix and O(D³) for eigen decomposition. For 100 million vectors at 768 dimensions, this is prohibitively expensive.
Solution: randomized SVD. Instead of exact eigen decomposition, use randomized algorithms that approximate the top k components in O(N × D × k) time. This is orders of magnitude faster for large-scale data. Libraries like sklearn and FAISS implement randomized PCA.
PROJECTION IS A MATRIX MULTIPLY
After training, PCA produces a projection matrix W of shape (D, k). Reducing a new vector is a single matrix multiplication: reduced = original @ W. This is O(D × k) per vector—fast enough for online inference.
Example: 768-dim to 128-dim projection is 768 × 128 = 98,304 operations per vector. At 10 GFLOPs, that is 10 microseconds. Negligible compared to embedding model inference (10-50ms).
HOW MUCH TO REDUCE
Rule of thumb: keep enough components to explain 90-95% of variance. For typical text embeddings (768-dim), reducing to 128-256 dims often retains 90%+ variance. Verify by measuring recall@k before and after reduction on your actual retrieval task.
If recall drops significantly, you are losing signal. Try reducing less aggressively (256 dims instead of 128). If recall is unchanged, you can likely reduce further (64 dims). The right target depends on your data distribution.
✅ Best Practice: Train PCA on a representative sample of your serving distribution. If you train on old data and serve on new data, the projection may not capture the new variance directions.

💡 Key Takeaways

✓PCA finds directions of maximum variance via eigenvectors of covariance matrix

✓Use randomized SVD for large-scale data—O(N×D×k) instead of O(D³)

✓Projection is single matrix multiply: O(D×k), ~10 microseconds for 768→128

✓Target 90-95% explained variance; verify by measuring recall before/after

📌 Interview Tips

1Interview Tip: Explain why randomized SVD is necessary at scale—exact eigen decomposition is O(D³), prohibitive for millions of vectors.

2Interview Tip: Describe how to choose k—explained variance percentage plus validation on actual task metrics.

← Back to Dimensionality Reduction (PCA, UMAP) Overview