Caching • Cache Stampede ProblemMedium⏱️ ~3 min
Probabilistic Early Refresh and TTL Jitter
Probabilistic early refresh and Time To Live (TTL) jitter are complementary techniques that prevent synchronized expiration across distributed fleets. The core problem is deterministic TTLs: when 1,000 application servers all cache the same key with identical 300 second TTL at time T equals 0, all 1,000 servers detect expiration simultaneously at T equals 300 and issue origin requests in lockstep. Probabilistic early refresh solves this by introducing randomized early recomputation: as a cached entry approaches expiration, each access has an increasing probability of triggering refresh before actual TTL expiry. A common formula is probability equals (current_time minus cache_time) divided by TTL raised to some power (beta parameter, typically 1 to 2). At 80 percent of TTL remaining, probability might be 5 percent; at 95 percent, probability rises to 40 percent. This smears refresh work across time instead of concentrating it at the exact TTL boundary.
TTL jitter complements this by varying the TTL itself. Instead of setting TTL to exactly 300 seconds, add random jitter: TTL equals 300 plus or minus random(0 to 60), yielding expiration times spread across 240 to 360 seconds. Even without probabilistic refresh, jitter prevents fleet wide synchronization. Combined, these techniques transform a sharp spike (1,000 servers refreshing in the same 100ms window) into a smooth ramp (refreshes spread across 60+ seconds). For a 10,000 RPS key with 1,000 caching instances, instead of 10,000 origin requests at T equals 300, you get approximately 166 requests per second spread over 60 seconds (10,000 divided by 60), a 60 times reduction in peak origin load.
Tuning involves balancing freshness against write amplification. Aggressive early refresh (beta equals 1, high probability near expiry) keeps data fresher but increases background refresh rate and origin load. Conservative settings (beta equals 2, lower probability, 20 percent jitter) reduce write amplification but allow more staleness. In practice, setting jitter to 10 to 20 percent of TTL and using beta equals 1.5 provides good balance: a 300 second TTL becomes 270 to 330 seconds with 10 percent jitter, and probabilistic refresh starts 30 to 60 seconds before expiry, spreading load without excessive early refreshes. This is especially critical during deployments and restarts when many cache instances start fresh simultaneously; jitter ensures they do not all expire and refresh on the same schedule.
💡 Key Takeaways
•Deterministic 300 second TTL on 1,000 servers causes synchronized expiration: all 1,000 servers refresh simultaneously at T equals 300, creating origin load spike
•Probabilistic early refresh with beta equals 1.5: refresh probability reaches 40 percent at 95 percent of TTL, spreading refreshes across final 60 seconds instead of concentrating at expiry moment
•TTL jitter of plus or minus 10 to 20 percent (e.g., 300s becomes 270 to 330s) prevents fleet wide synchronization even without probabilistic logic, essential during rolling restarts
•Combined effect for 10,000 RPS key with 1,000 instances: transforms 10,000 origin request spike into ~166 QPS ramp over 60 seconds, reducing peak load by 60 times
•Write amplification tradeoff: aggressive early refresh (beta equals 1) keeps data fresh but increases origin load; conservative settings (beta equals 2, 20 percent jitter) reduce load but increase staleness
•Critical during cold starts and deployments: without jitter, restarted fleet synchronizes on identical TTL schedules and creates stampedes at every expiration cycle indefinitely
📌 Examples
Microservice with 500 instances caching user sessions: 300 second TTL with 15 percent jitter (255 to 345s range) and beta equals 1.5. During rolling restart, instances come online over 10 minutes with staggered TTLs, preventing synchronized first refresh across fleet.
Reddit comment cache: 60,000 RPS across 2,000 instances. Without jitter, all instances refresh simultaneously every 5 minutes (120,000 origin requests). With 20 percent jitter and probabilistic refresh (beta equals 1.2), refreshes spread over 90 seconds (120,000 / 90 = 1,333 QPS), reducing peak by 90 times.
Implementation in Python: ttl = base_ttl * (1 + random.uniform(negative 0.1, 0.1)) for jitter. On read: age = time.time() minus cached_time; refresh_prob = (age / ttl) ** 1.5; if random.random() < refresh_prob: trigger_refresh(). This spreads refreshes across the final 20 to 30 percent of TTL window.