Embeddings & Similarity SearchApproximate Nearest Neighbors (FAISS, ScaNN, HNSW)Hard⏱️ ~3 min

ANN Failure Modes: Data Drift, Imbalanced Partitions, and Hardware Effects

DATA DRIFT DEGRADES RECALL

ANN indexes encode assumptions about data distribution. IVF centroids, PQ codebooks, and HNSW graph structure are optimized for the training data. When embedding distributions shift—seasonal trends, new content types, model updates—the index no longer aligns with the data.

Symptoms: recall drops 5-15 percentage points over weeks or months. Queries return unexpected results. Latency increases as search explores more partitions to maintain recall targets.

Detection: sample 0.1% of queries and compare ANN results to exact search on a 1-million vector subset. If recall@10 drops below threshold (e.g., 0.90), trigger reindex. Monitor weekly.

Mitigation: rebuild indexes periodically (monthly to quarterly). For streaming data, use incremental index updates or maintain rolling window indexes.

IMBALANCED PARTITIONS

IVF works well when clusters are roughly equal-sized. In practice, data often clusters unevenly—popular categories have 10x more vectors than niche ones. Imbalanced partitions hurt both recall and latency.

If one partition contains 50% of vectors but nprobe is set assuming uniform distribution, searches hitting that partition are 25x slower. Alternatively, if you probe the same number of cells regardless of size, recall in dense partitions drops.

Fixes: use more fine-grained partitioning in dense regions. Train hierarchical IVF with different granularity per region. Or switch to HNSW which handles non-uniform distributions better.

HARDWARE AND DEPLOYMENT EFFECTS

Memory bandwidth bottleneck: Scanning compressed vectors is CPU-bound on memory bandwidth, not compute. Hyperthreading helps little. NUMA effects cause 2x latency variance across cores.

Cold cache penalty: First query after service restart hits cold CPU caches. Latency can be 5-10x higher until cache warms. Pre-warm with representative queries at startup.

Batch size effects: Single queries underutilize SIMD. Batching 8-16 queries together can improve throughput 3-5x with minimal latency increase per query.

EDGE CASES IN PRODUCTION

Empty results: If nprobe is too low or all nearby partitions are empty, search returns nothing. Always return at least K results even if distances are high.

Distance threshold failures: Some systems filter by max distance. Threshold tuned on old data may reject valid matches after embedding drift.

❗ Critical: Monitor recall continuously in production. ANN failure is silent—queries return results, just wrong ones. Sample-and-compare is your only detection mechanism.
💡 Key Takeaways
Data drift degrades recall 5-15 points over time; rebuild indexes monthly to quarterly
Imbalanced partitions cause latency variance and recall drops in dense regions
Hardware effects: memory bandwidth limits throughput, NUMA causes latency variance
ANN failure is silent—queries return results, just wrong ones. Monitor via sampling.
📌 Interview Tips
1Interview Tip: Describe recall monitoring—sample 0.1% of queries, compare to exact search, alert if recall drops below threshold.
2Interview Tip: Explain why imbalanced partitions are problematic and how to detect them in production.
← Back to Approximate Nearest Neighbors (FAISS, ScaNN, HNSW) Overview