Failure Modes: Encoder Mismatch and Hot Shard Skew
INDEX CORRUPTION
Index files can become corrupted during writes, crashes, or disk failures. Symptoms: queries return wrong results, crashes during search, inconsistent recall across requests.
Prevention: write indexes atomically (write to temp file, rename on success). Use checksums to verify integrity after write. Keep previous version for rollback.
Detection: periodic integrity checks comparing index behavior to ground truth. If recall drops suddenly without model changes, suspect corruption.
STALE CENTROIDS
IVF centroids trained on old data become misaligned as vector distribution shifts. New vectors cluster poorly, recall drops for recent content.
Detection: monitor per-partition sizes. Healthy distribution: partitions within 2x of average. Unhealthy: some partitions have 10x+ vectors (new content clustering badly), others are nearly empty.
Fix: retrain centroids on recent data. Schedule centroid refresh every 2-4 weeks for active content domains.
HOT SHARDS
Semantic sharding can create hot spots. If one cluster (shard) contains popular items, that shard handles disproportionate traffic. Latency spikes, SLO violations.
Detection: monitor per-shard QPS and latency. Healthy: shards within 2x of each other. Unhealthy: some shards at 5-10x average load.
Mitigation: replicate hot shards more heavily. Redistribute items—break large semantic clusters into smaller sub-shards.
VERSION SKEW
During rolling deployment, some replicas serve old index while others serve new. If query fans out to mixed versions, results are inconsistent—some candidates from old embeddings, some from new.