Learn→ML Model Optimization→Model Caching (Embedding Cache, Result Cache)→3 of 6

ML Model Optimization • Model Caching (Embedding Cache, Result Cache)Medium⏱️ ~3 min

Cache Key Design and Canonicalization for High Hit Rates

WHY CACHE KEY DESIGN MATTERS
Cache key design determines what gets cached together versus separately. Poor key design causes either cache pollution (returning wrong results to users) or cache fragmentation (same computation cached multiple times under different keys, wasting memory).
Consider a search ranking model. Should two users searching "laptops" share cached results? Depends on whether personalization affects rankings. If yes, user ID must be in the cache key. If rankings are identical for all users, including user ID fragments the cache unnecessarily—you store N copies of the same result. Key design encodes your caching policy.
KEY COMPONENTS FOR ML SYSTEMS
Model version: Almost always required. Caching results from model v1 and returning them for model v2 queries gives stale results. Include model hash or version number in every cache key.
Feature schema version: If feature extraction changes, cached embeddings become invalid. A user embedding computed with 50 features is incompatible with a model expecting 75 features.
Input normalization: Raw text "  Hello World  " and "hello world" should hit the same cache entry if your model treats them identically. Canonicalize inputs before hashing: lowercase, strip whitespace, sort dictionary keys. Document the normalization rules so the team applies them consistently.
Context handling: For LLMs, should different conversation histories sharing the same final user query hit the same cache? Usually no—context changes output. Include conversation fingerprint in key.
HIERARCHICAL CACHE KEYS
Structure keys hierarchically for efficient invalidation. Format: {model_version}:{feature_version}:{input_hash}. When model updates, invalidate by prefix. When features change, invalidate that segment. Granular invalidation without full cache flush.
⚠️ Key Trade-off: Hash collisions return wrong results silently. Use SHA-256 for correctness. MD5 and CRC32 have higher collision probability. 64-bit hashes are usually sufficient—collision probability at 1 billion keys is still 1 in a million.

💡 Key Takeaways

✓Cache key encodes caching policy—what shares results vs what gets separate entries

✓Include model version, feature version, and normalized input hash in keys

✓Canonicalize inputs before hashing: lowercase, strip whitespace, sort keys

✓Hierarchical key format enables granular invalidation by prefix

📌 Interview Tips

1Interview Tip: Discuss the tradeoff between key specificity and hit rate—more specific keys prevent wrong results but fragment cache.

2Interview Tip: Explain input canonicalization with a concrete example—why two textually different inputs should sometimes share cache.

← Back to Model Caching (Embedding Cache, Result Cache) Overview