Trade-offs: Freshness, Recall, Latency, and Cost
THE FUNDAMENTAL TENSION
Index management forces you to choose between four competing goals that cannot all be maximized simultaneously. Improving one typically degrades another. Understanding these trade-offs lets you make intentional decisions rather than discovering painful surprises in production.
Freshness vs recall: Faster index updates (better freshness) mean smaller training batches for clustering. Smaller batches produce worse centroids, reducing recall. A batch of 100K vectors produces centroids that miss 8-12% of relevant items; a batch of 10M produces centroids missing only 2-4%.
Recall vs latency: Higher recall requires searching more partitions. Searching 5% of partitions gives ~92% recall at 15ms. Searching 20% gives ~98% recall at 60ms. For most recommendation systems, 92% recall is acceptable. For safety-critical search (medical, legal), you need 98%+.
COST TRADE-OFFS
Memory vs latency: Keeping all indexes in RAM gives sub-10ms latency. Spilling to SSD increases p99 to 50-100ms but cuts memory costs by 70%. Tiered storage (hot data in RAM, cold on SSD) balances this.
Replication vs cost: More replicas improve read throughput and fault tolerance. 3 replicas give good availability but triple storage costs. For non-critical workloads, 2 replicas may suffice. Critical systems need 3+ across availability zones.
Build frequency vs compute cost: Daily full rebuilds ensure optimal index quality but consume significant compute. Weekly rebuilds with daily incremental updates reduce cost 5-8x while maintaining 95%+ of optimal recall.
MONITORING ESSENTIALS
Track p50, p95, p99 latencies with clear SLOs (e.g., p99 < 50ms). Sample recall weekly by comparing index results to brute-force search on test queries. Monitor index freshness (time since newest item indexed). Alert on recall drops >2% or latency exceeding SLO for 5+ minutes.