Cache Key Design and Canonicalization for High Hit Rates
WHY CACHE KEY DESIGN MATTERS
Cache key design determines what gets cached together versus separately. Poor key design causes either cache pollution (returning wrong results to users) or cache fragmentation (same computation cached multiple times under different keys, wasting memory).
Consider a search ranking model. Should two users searching "laptops" share cached results? Depends on whether personalization affects rankings. If yes, user ID must be in the cache key. If rankings are identical for all users, including user ID fragments the cache unnecessarily—you store N copies of the same result. Key design encodes your caching policy.
KEY COMPONENTS FOR ML SYSTEMS
Model version: Almost always required. Caching results from model v1 and returning them for model v2 queries gives stale results. Include model hash or version number in every cache key.
Feature schema version: If feature extraction changes, cached embeddings become invalid. A user embedding computed with 50 features is incompatible with a model expecting 75 features.
Input normalization: Raw text " Hello World " and "hello world" should hit the same cache entry if your model treats them identically. Canonicalize inputs before hashing: lowercase, strip whitespace, sort dictionary keys. Document the normalization rules so the team applies them consistently.
Context handling: For LLMs, should different conversation histories sharing the same final user query hit the same cache? Usually no—context changes output. Include conversation fingerprint in key.
HIERARCHICAL CACHE KEYS
Structure keys hierarchically for efficient invalidation. Format: {model_version}:{feature_version}:{input_hash}. When model updates, invalidate by prefix. When features change, invalidate that segment. Granular invalidation without full cache flush.