Embeddings & Similarity Search • Approximate Nearest Neighbors (FAISS, ScaNN, HNSW)Hard⏱️ ~3 min
ANN Failure Modes: Data Drift, Imbalanced Partitions, and Hardware Effects
Data drift degrades ANN recall over time. When embedding distributions shift due to seasonal trends, new content types, or model updates, IVF centroids and PQ codebooks no longer align with the data. Quantization error increases and recall drops by 5 to 15 percentage points. Operators monitor recall continuously by sampling 0.1 percent of queries and comparing ANN results to exact search on a 1 million item canary set. When recall drops below a threshold like 95 percent, the index needs retraining. For HNSW, drift creates navigational dead ends when new clusters form far from existing graph regions, causing sudden latency spikes and lower recall at constant efSearch.
Imbalanced partitions in IVF indexes are another common failure. If some coarse centroids accumulate many more vectors than others, queries probing those hot centroids must scan long lists, increasing latency. This manifests as p99 latency creeping up while median stays stable. For example, average list length might be 5000 but the 99th percentile list has 50000 vectors, causing p99 queries to take 10 times longer. Solutions include increasing the number of centroids or using a two level partitioning scheme. HNSW sees similar issues with tombstoned deletions. Over time the graph fills with dead links and query paths lengthen, raising both latency and reducing recall. Periodic compaction or full rebuilds are required.
Hardware effects introduce subtle correctness and performance issues. GPU indexes starve on small batches, lowering throughput and increasing latency jitter. Operators must tune batch sizes between 32 and 256 to saturate compute units. On CPU, NUMA effects hurt performance when vector data is not bound to local sockets. Cross socket memory traffic slows distance computations by 20 to 40 percent. Vector normalization mismatches cause correctness bugs. If training used cosine similarity with normalized vectors but serving uses unnormalized vectors with inner product, the nearest neighbors skew toward high norm items, breaking relevance. Cache effects mask problems during peak load when popular items are cached, hiding recall regressions until cache misses rise.
💡 Key Takeaways
•Data drift increases quantization error in IVF PQ and ScaNN, dropping recall by 5 to 15 points; monitor with 0.1 percent canary queries against exact search, retrain when recall falls below 95 percent
•Imbalanced IVF lists cause p99 latency spikes: hot centroids with 10 times more vectors slow queries probing those lists, median latency stays stable while p99 increases by 5 to 10 times
•HNSW deletions accumulate tombstones that create dead links, lengthening query paths and reducing recall; plan monthly compaction to remove 20 to 30 percent overhead from deleted nodes
•NUMA effects on CPU slow distance computation by 20 to 40 percent when vectors are not socket local; bind shards to CPU sockets and use memory affinity for consistent performance
•Normalization mismatches break correctness: training on cosine with normalized vectors, serving with unnormalized inner product causes nearest neighbors to skew toward high norm items
•GPU batch size tuning critical: batches under 16 starve compute and increase latency jitter, batches of 32 to 256 saturate A100 and achieve stable 5 milliseconds per batch at 10000 QPS
📌 Examples
Seasonal drift: e-commerce embeddings shift during holiday shopping, IVF PQ recall drops from 97 to 89 percent over 3 weeks, weekly retrain restores recall to 96 percent
List imbalance detection: monitor average list length 5000 and p99 list length 48000, queries hitting p99 lists take 45 milliseconds vs 5 milliseconds median, fix by increasing centroids from 65536 to 131072
HNSW tombstone accumulation: after 6 months with 30 percent item churn, p99 latency rises from 18 to 35 milliseconds, rebuild index to remove dead nodes, latency returns to 20 milliseconds