Embeddings & Similarity SearchIndex Management (Building, Updating, Sharding)Hard⏱️ ~3 min

Failure Modes: Encoder Mismatch and Hot Shard Skew

INDEX CORRUPTION

Index files can become corrupted during writes, crashes, or disk failures. Symptoms: queries return wrong results, crashes during search, inconsistent recall across requests.

Prevention: write indexes atomically (write to temp file, rename on success). Use checksums to verify integrity after write. Keep previous version for rollback.

Detection: periodic integrity checks comparing index behavior to ground truth. If recall drops suddenly without model changes, suspect corruption.

STALE CENTROIDS

IVF centroids trained on old data become misaligned as vector distribution shifts. New vectors cluster poorly, recall drops for recent content.

Detection: monitor per-partition sizes. Healthy distribution: partitions within 2x of average. Unhealthy: some partitions have 10x+ vectors (new content clustering badly), others are nearly empty.

Fix: retrain centroids on recent data. Schedule centroid refresh every 2-4 weeks for active content domains.

HOT SHARDS

Semantic sharding can create hot spots. If one cluster (shard) contains popular items, that shard handles disproportionate traffic. Latency spikes, SLO violations.

Detection: monitor per-shard QPS and latency. Healthy: shards within 2x of each other. Unhealthy: some shards at 5-10x average load.

Mitigation: replicate hot shards more heavily. Redistribute items—break large semantic clusters into smaller sub-shards.

VERSION SKEW

During rolling deployment, some replicas serve old index while others serve new. If query fans out to mixed versions, results are inconsistent—some candidates from old embeddings, some from new.

❗ Critical: Route each query to a single index version. Use version-aware routing or blue-green deployment to avoid mixing old and new embeddings in one response.
💡 Key Takeaways
Index corruption: write atomically, verify checksums, keep rollback version
Stale centroids: monitor partition size distribution; retrain every 2-4 weeks
Hot shards: monitor per-shard QPS; replicate heavily or split large clusters
📌 Interview Tips
1Interview Tip: Explain version skew risk—mixed old/new embeddings produce inconsistent results.
2Interview Tip: Describe stale centroid detection—partition sizes should be within 2x of average.
← Back to Index Management (Building, Updating, Sharding) Overview