Multi-Tier Caching for Features and Embeddings
WHY MULTI-TIER CACHING
Single-tier fails at scale. In-process cache is fast (microseconds) but limited by RAM. Distributed cache handles more data but adds 1-5ms latency. Persistent storage handles everything but takes 10-50ms. Multi-tier combines all: check local first, then distributed, then storage. Hit rates compound—90% local × 90% distributed = 99% before touching storage.
CACHE TIER ARCHITECTURE
L1 (in-process): LRU in application memory. 100MB-1GB per instance. Latency: 10-100 microseconds. L2 (distributed): Redis cluster. 10GB-1TB shared. Latency: 1-5ms. L3 (persistent): Feature store. Unlimited. Latency: 10-50ms. Each tier 10-100x slower but 10-100x larger.
INVALIDATION STRATEGIES
TTL-based: Expire after fixed time. Simple but may serve stale data. Event-driven: Invalidate on updates. Fresh but complex across tiers. Versioning: Version in cache key. New version = miss. Clean but increases key cardinality.