Cache Aside: The Default Pattern for Read Heavy Systems
The Read Path in Detail
When a request arrives, the application follows a precise sequence. First, check the cache for the requested key. On a cache hit, return immediately with typical latency of 0.5-2ms for distributed caches. On a cache miss, query the database, which takes 5-50ms depending on query complexity. Store the result in cache with a TTL, then return the data. The TTL determines staleness tolerance: set it too short (30 seconds) and you flood the database with cache misses; set it too long (1 hour) and users see stale data after updates. Most systems start with 5-15 minute TTLs. Adding TTL jitter (randomizing expiry by 10-20%) prevents synchronized expiry that causes thundering herd problems.
The Write Path: Delete Not Update
When data changes, write to the database first, then delete the corresponding cache key. This ordering is essential. If you delete cache first, a concurrent reader might repopulate the cache with old data before your database write completes, leaving stale data cached until TTL expires. The pattern is delete on write, not update cache on write. Updating cache directly creates race conditions: Thread A and Thread B both update the same record; Thread A updates database first, Thread B second with newer data; but due to network timing, Thread B updates cache first, then Thread A overwrites cache with older data. Now cache and database are inconsistent until TTL expires. Deletion avoids this because the next reader fetches fresh data from the database.
Production Implementation Patterns
Large scale deployments use lease tokens to prevent thundering herds. When a key is missing, the first requester acquires a lease (a short lived lock); other requesters wait for the lease holder to populate the cache rather than all querying the database simultaneously. A two tier architecture with L1 per host cache plus L2 distributed cache reduces network hops for hot data. Multiget operations batch multiple key fetches in a single round trip, reducing network overhead when fetching related data. These optimizations enable systems to handle billions of cache operations per second with sub-millisecond p95 latencies and hit ratios exceeding 90%.
When Cache Aside Excels
Cache aside is ideal when read to write ratios are high (10:1 or greater), you need fine grained control over what gets cached, your cache infrastructure is separate from your database (allowing independent scaling), and you can tolerate eventual consistency where cached data might be stale until TTL expires. The pattern works exceptionally well with denormalized key designs where you cache precomputed views.