Semantic Result Cache: Architecture and Similarity Thresholds
WHAT IS SEMANTIC RESULT CACHING
Semantic result caching stores complete model outputs keyed by the meaning of the input, not just exact bytes. Unlike exact-match caches that require identical inputs, semantic caching returns cached results for inputs that are "close enough" to previous queries. The insight: many queries differ textually but share the same intent.
Consider a recommendation system receiving "show me action movies" and "recommend action films." Exact-match cache misses both. Semantic cache recognizes these share intent and returns the same cached recommendation list. This can improve hit rates from 5% (exact match) to 40% (semantic match) for natural language queries.
EXACT MATCH VS SEMANTIC MATCH
Exact match: Hash the raw input bytes. Fast O(1) lookup, guaranteed correctness, but low hit rate. Works for structured API requests with identical parameters. Miss rate is high when inputs vary in formatting, whitespace, or phrasing. Typical hit rate: 5-15% for natural language, 30-50% for structured queries.
Semantic match: Embed the input query into a vector, then search for cached embeddings within a distance threshold. Higher hit rate but introduces approximate matching risk. You might return slightly wrong results if the similarity threshold is too loose. Typical hit rate: 30-60% for natural language queries.
SIMILARITY THRESHOLD TUNING
The distance threshold determines when two queries are "similar enough" to share cached results. Too tight (cosine similarity > 0.99) and hit rate drops to near-zero. Too loose (> 0.85) and you return wrong answers. Production systems typically tune to 0.95-0.97 based on offline evaluation.
Tuning process: collect 1000+ query pairs, label whether they should share results, measure precision/recall at different thresholds. Choose threshold where precision stays above 99% while maximizing recall. Different query types may need different thresholds—factual questions need tighter matching (0.98) than exploratory searches (0.93).