Operational Metrics and Failure Detection
FRESHNESS METRICS
Time to searchability: How long from item creation until it appears in search results. For hot+main architecture, this is hot index ingestion time (typically seconds to minutes). Track p50, p95, p99. Alert if p99 exceeds SLO (e.g., >5 minutes for e-commerce).
Index age: Age of the newest item in main index. Reflects how stale the main index is. If merge cycle is daily, main index age oscillates between 0-24 hours. Track to ensure merges complete on schedule.
Hot index size: Number of vectors in hot index. Should stay within designed capacity. If hot index grows beyond threshold, it signals merge failures or capacity issues.
QUALITY METRICS
Recall@K: Weekly sampling of queries, comparing index results to brute-force exact search. Establishes baseline (e.g., 95% recall@100). Alert on 2%+ drop—indicates drift or index corruption.
Latency distribution: p50, p95, p99 query latency. Sudden p99 spikes often indicate hot shard issues or resource contention. Gradual increases suggest index bloat or degraded structures.
FAILURE DETECTION
Merge failures: Monitor merge job completion. Failures leave hot index growing unbounded. Set alerts for merge duration exceeding 2x normal time or consecutive failures.
Ingestion backlog: Queue depth for vectors waiting to enter hot index. Growing backlog indicates ingestion cannot keep up with arrival rate. Scale ingestion workers or reduce rate.
Cross-index inconsistency: Periodically sample items, verify they appear in exactly one index (hot or main, not both, not neither). Inconsistencies indicate merge bugs or delete propagation issues.