Production Implementation: Orchestration, Caching, and Observability
PIPELINE ORCHESTRATION
The request coordinator fans out to retrievers in parallel, waits with 30ms timeout, merges candidates, applies filters, then passes to ranking. Each retriever runs independently. The coordinator collects whatever arrives before timeout and proceeds with partial candidates rather than failing. This degradation strategy keeps the system available even when individual retrievers are slow.
CACHING STRATEGIES
User embeddings cache for 5 to 15 minutes since preferences change slowly. Item embeddings cache for hours. Retrieval results for popular queries cache for 1 to 5 minutes. Ranking features are tricky: static features (item age, category) cache well, personalized features cannot. Typical systems achieve 60 to 80% cache hit rate on retrieval, reducing average latency by 40%.
OBSERVABILITY AND DEBUGGING
Log at each stage: candidate count, score distribution (min, max, median), latency, sample item IDs. Build debug mode storing full candidate lists for sampled requests. When recommendations look wrong, retrieve the trace to see what retrieval returned, what ranking scored, where it broke.
A/B TESTING PIPELINE CHANGES
Changes to retrieval affect candidate diversity, affecting ranking behavior, affecting engagement. A new retriever improving recall 5% might surface candidates with lower ranking scores. Always measure end to end metrics (clicks, conversions), not just stage metrics. Run experiments at least one week to capture weekly patterns.