Sharding Vector Indexes: Balancing Load and Latency
UPDATE STRATEGIES
Indexes need updates when: new content is added, existing content changes (embeddings updated), or content is deleted. Each has different solutions and tradeoffs.
Full rebuild: Regenerate entire index from scratch. Most accurate but slowest. Use for major embedding model updates or when incremental drift becomes unacceptable. Typical cadence: weekly to monthly.
Incremental update: Add new vectors to existing index structure. Fast but may degrade quality over time. HNSW supports this naturally; IVF-PQ requires assigning to existing centroids.
Hybrid: Maintain a small "delta" index for recent items, periodically merge into main index. Balances freshness and quality.
INCREMENTAL UPDATE MECHANICS
HNSW incremental: Insert new vectors by finding neighbors in existing graph, adding edges. Quality degrades slightly over time—new vectors see incomplete neighborhoods if inserted late. Rebuild when recall drops 2-3%.
IVF incremental: Assign new vectors to nearest existing centroid, add to that partition. Centroids become stale as distribution shifts. If >20% of vectors are post-training, centroids may be misaligned.
Deletion: Most indexes support soft deletion (mark as deleted, filter at query time). Hard deletion requires compaction or rebuild. Soft-delete overhead: 5-10% query slowdown as deleted vectors are still scanned.
WHEN TO REBUILD
Monitor recall on a fixed query set. When recall drops below threshold (e.g., from 0.95 to 0.92), trigger rebuild. Also rebuild after embedding model updates—old and new embeddings are incompatible.