Index Drift and Consistency Guarantees
INDEX DRIFT EXPLAINED
Index drift occurs when the index structure becomes misaligned with the underlying data. In IVF indexes, this happens when cluster centroids no longer represent the actual data distribution. New vectors cluster differently than the training data, so routing queries to centroids misses relevant results.
Measuring drift: sample queries, compare index results against brute-force exact search. Recall@100 dropping from 95% to 88% over two weeks indicates significant drift. At 85% recall, visible quality degradation occurs in user-facing search.
CONSISTENCY CHALLENGES
Write-search consistency: After a vector is added to the hot index, can a query immediately find it? With async writes, there is a brief window (milliseconds to seconds) where the vector exists but is not searchable.
Cross-index consistency: During hot-to-main merges, the same vector might briefly appear in both indexes or neither. Queries during merge might return duplicates or miss items.
Delete consistency: Deleting an item requires removing it from both hot and main indexes. If delete propagates to hot but not main (or vice versa), deleted items may still appear in results.
HANDLING CONSISTENCY
Idempotent operations: Design inserts and deletes to be safely repeatable. If a merge fails mid-way, retry should produce correct results.
Version tracking: Assign monotonic versions to vectors. During query, filter results to exclude outdated versions. This handles duplicates during merges.
Tombstone records: Mark deletions rather than physically removing. Clear tombstones during compaction. Ensures deletes propagate correctly across index tiers.