Implementation Details: Sharding, Monitoring, and Optimization
Index Management
Vector indices need maintenance. New documents require embedding and insertion. Deleted documents leave dead entries. Updated documents need old vectors removed and new ones added. Index building is expensive - a million vectors with HNSW takes 10-30 minutes. Some systems support online updates while serving queries; others require downtime.
Filtering and Metadata
Pure semantic search ignores structure. But users often want semantic search within constraints: "find similar products in electronics under ." Efficient filtering requires metadata support before or during ANN search. Pre-filtering works for selective filters. Post-filtering works when most documents pass the filter.
Monitoring and Debugging
Semantic search failures are hard to debug - no obvious "wrong answer." Monitor query latency (P50, P99), click-through rates, and no-click rates. Log query and result vectors to diagnose poor matches. Visualizing embeddings with dimensionality reduction (t-SNE, UMAP) reveals clustering problems.
Scaling Patterns
Beyond single-node capacity, shard vectors across machines by partition. Query all shards in parallel, merge results. This adds latency but enables arbitrary scale. Some workloads benefit from replicas: multiple copies of the same shard serving read traffic for higher throughput.