Dynamic Vector Indexes for Continuous Updates
WHY STANDARD INDEXES STRUGGLE
Standard vector indexes (IVF, HNSW) are built assuming data is static. IVF pre-computes cluster centroids from training data. When new vectors arrive with different distributions, existing centroids may poorly represent them. HNSW builds a fixed graph structure that becomes suboptimal as the data distribution shifts.
For indexes handling 10K+ inserts per hour, these limitations become painful. Recall degrades 5-15% over weeks as the index drifts from optimal structure. Periodic rebuilds restore quality but create the freshness gap problem.
DYNAMIC INDEX APPROACHES
Mutable HNSW: Standard HNSW with in-place insertions. New nodes connect to existing graph neighbors. Works reasonably well for small update rates (<1% of index size per day). Graph quality degrades with high update rates.
Tiered indexes: Multiple HNSW indexes at different sizes. New vectors go to smallest tier. When tier fills, merge into next larger tier. Similar to LSM-tree design in databases. Balances insert speed and search quality.
Streaming IVF: IVF index with dynamic centroid updates. Periodically retrain centroids on recent data. Requires balancing centroid stability (for routing consistency) against adaptation to data drift.
IMPLEMENTATION COMPLEXITY
Dynamic indexes add significant complexity. You need: concurrent insert/search handling (readers-writer locks or lock-free structures), memory management for growing indexes, background compaction without blocking queries, and monitoring for quality degradation.
Most production systems choose simpler hot+main architecture over complex dynamic indexes. The operational overhead of dynamic indexes often exceeds the benefit unless freshness requirements are extreme (sub-minute).