Trade-offs: Latency, Cost, Accuracy, and Freshness
LATENCY VS COST
Lower latency requires more resources. HNSW with full vectors: 2ms, 1TB RAM (k/month). IVF-PQ: 20ms, 64GB RAM (/month). 10x latency reduction can mean 20x cost increase. Define latency SLAs first, then optimize cost within budget.
ACCURACY VS SPEED
Approximate search trades recall for speed. 99% recall at 5ms vs 95% at 1ms. Missing 5% may be OK for recommendations but not for search. ANN parameters (HNSW M, efSearch) control this—tune based on accuracy needs.
FRESHNESS VS EFFICIENCY
Caching improves latency but serves stale data. For recommendations, 1-hour staleness is acceptable. For news, 5-minute freshness is required. Choose cache TTLs based on content velocity.
DECISION FRAMEWORK
Step 1: Define latency SLA. Step 2: Define accuracy (recall target). Step 3: Define freshness. Step 4: Calculate cost at configurations. Step 5: Choose minimum cost meeting all requirements.