Embeddings & Similarity Search • Real-time Updates (Incremental Indexing)Hard⏱️ ~3 min
Dynamic Vector Indexes for Continuous Updates
Dynamic vector indexes support online insertions and deletions without requiring full rebuilds. Navigable small world graphs like Hierarchical Navigable Small World (HNSW) are the most popular choice, allowing logarithmic or polylogarithmic neighbor updates per insertion. When you add a new vector, the index finds its k nearest neighbors in the existing graph and creates bidirectional edges, then potentially prunes long range connections to maintain bounded degree. This makes the structure continuously searchable with bounded overhead per write.
The trade off is memory and maintenance cost. HNSW typically uses 1.5 to 4 GB of RAM per million vectors at 256 to 768 dimensions, depending on graph degree and whether you store full precision or quantized vectors. Higher graph degrees improve recall but increase memory and insertion time. A typical configuration might be M equals 16 to 32 neighbors per layer, giving 95 percent plus recall at 10 to 30 milliseconds query latency for million scale indexes. Insertions take 5 to 50 milliseconds depending on index size and degree.
Deletions are handled with tombstones or lazy removal. Mark deleted vectors as inactive so queries skip them, but edges remain until compaction. If tombstone ratio exceeds 10 to 20 percent, recall degrades and query latency increases due to wasted graph traversals. Schedule periodic compaction to rebuild affected graph regions or the entire index. High churn workloads with frequent deletes need compaction every few hours or daily to maintain quality.
Static structures like trees or quantized Inverted File (IVF) variants are faster and smaller but require offline rebuilds. Spotify's Annoy uses random projection trees, optimized for read heavy workloads where the index is built once and queried millions of times. If your workload has continuous writes and you need second level freshness, dynamic graphs are essential. If writes are rare or can be batched hourly, static indexes with blue green swaps give better p99 latency and lower memory cost.
💡 Key Takeaways
•HNSW supports online insertions with 5 to 50 milliseconds per insert, maintaining 95 percent plus recall with M equals 16 to 32 neighbors per layer
•Memory cost is 1.5 to 4 GB per million vectors at 256 to 768 dimensions, higher graph degree improves recall but increases RAM and insertion time
•Deletions use tombstones, if tombstone ratio exceeds 10 to 20 percent recall degrades, requiring compaction every few hours or daily for high churn
•Static indexes like Annoy are faster for reads and use less memory but need full rebuilds, Spotify swaps prebuilt indexes during low traffic windows
•Dynamic graphs essential for second level freshness with continuous writes, static indexes better for read heavy workloads with hourly or daily batch updates
•Query latency typically 10 to 30 milliseconds for million scale HNSW, insertions take logarithmic time but constant factor is higher than read only structures
📌 Examples
HNSW configuration: 2 million vectors at 512 dimensions, M equals 24, uses 6 GB RAM, 95 percent recall at top 100, 25ms p95 query, 30ms insert
High churn scenario: 20 percent of index deleted daily, tombstone ratio hits 15 percent after 3 days, schedule nightly compaction to maintain recall
Spotify Annoy for read heavy: build index offline with 50 million tracks, deploy as read only, swap new index every 6 hours, query latency 8ms p95