CachingCache Invalidation StrategiesMedium⏱️ ~3 min

Event Driven Invalidation: Pushing Changes to Caches for Strong Freshness

Event driven invalidation achieves stronger freshness guarantees by actively notifying caches when source data changes, rather than waiting for time based expiry. When a write commits to your source of truth (database, primary store), you immediately publish invalidation events to an event bus or message queue. Cache tiers subscribe to these events and delete (or refresh) affected keys, typically within milliseconds to low seconds. This approach tightens the staleness window dramatically compared to TTL only strategies: Pinterest targets sub second to low seconds for correctness sensitive updates like pin privacy changes, and Meta propagates invalidations across regions in under 2 seconds for 99th percentile cases. The cost is distributed systems complexity around delivery guarantees, ordering, idempotency, and the risk that invalidation pipeline failures amplify into outages. The critical implementation details center on reliability and ordering. First, use at least once delivery semantics with idempotent invalidation handlers keyed by entity identifier and version to survive message redelivery and retries. Second, partition your event stream by entity identifier (for example, user_id or post_id) to preserve per key ordering and avoid reordering anomalies where an older update invalidates after a newer one, leaving stale data cached. Third, always commit to your source of truth before publishing invalidation events (commit then invalidate ordering), otherwise a read might cache stale data between invalidation and commit. Fourth, include monotonically increasing version numbers or timestamps in events so consumers can detect and discard out of order events that arrive late. These patterns are how Meta's Memcache and TAO maintain consistency at over 1 billion cache operations per second: durable database commits trigger invalidation events via a replicated log, partitioned by key, with leases preventing thundering herds during the invalidation window. Event driven invalidation excels for correctness critical or privacy sensitive mutations where even short staleness windows are unacceptable: user permissions and access control (showing private content is a security bug), inventory and pricing (selling out of stock items or wrong prices loses money), financial balances and budgets (correctness requirement), and visibility changes on social platforms (privacy violations). For these cases, the operational complexity and potential for invalidation outages is justified by the business or compliance requirement. However, for read heavy content with acceptable staleness like blog posts, public profiles, or product images, the added complexity often is not worth it compared to simple TTL with longer expiry windows.
💡 Key Takeaways
Event driven invalidation tightens staleness windows to milliseconds or low seconds (Pinterest sub second for privacy changes, Meta under 2 seconds cross region p99) versus minutes or hours with TTL only, critical for correctness sensitive data
Requires at least once delivery with idempotent handlers keyed by entity and version, partition events by entity identifier to preserve ordering, and commit before invalidate ordering to prevent caching stale data during the write
Include monotonically increasing version numbers or timestamps in events so consumers detect and discard out of order arrivals, preventing older updates from overwriting newer cached values after network delays or retries
Best for correctness critical domains: permissions and access control (privacy bugs), inventory and pricing (revenue loss), financial balances (compliance), but overkill for read heavy content with acceptable staleness like blog posts or images
Failure modes amplify: if invalidation pipeline fails or lags, caches serve stale data until TTL expiry (include max TTL as safety net), and thundering herds occur during invalidation windows unless mitigated with leases or single flight patterns
Trade off complexity versus freshness: event driven adds operational burden (message brokers, partitioning, monitoring delivery lag) justified only when business or compliance requires strong freshness guarantees
📌 Examples
Meta's TAO invalidates cache entries by publishing events after database commits via a replicated log partitioned by social graph object identifier, with lease based reads preventing concurrent origin hits during invalidation, handling over 1 billion operations per second
Pinterest uses an event bus to fan out invalidations to edge CDN and mid tier caches when pin visibility changes (public to private), targeting sub second propagation for privacy correctness while accepting longer TTL based expiry for general content to reduce origin load
An e commerce platform immediately invalidates product inventory cache keys after purchase commits, publishing {product_id: 789, version: 52, quantity: 0} events partitioned by product_id, ensuring out of stock state propagates in under 500 milliseconds to prevent overselling
← Back to Cache Invalidation Strategies Overview