Probabilistic Early Refresh and TTL Jitter

The Synchronized Expiration Problem
Probabilistic early refresh and Time To Live (TTL) jitter are complementary techniques that prevent synchronized expiration across distributed fleets. The core problem is deterministic TTLs: when 1,000 application servers all cache the same key with identical 300 second TTL at time T equals 0, all 1,000 servers detect expiration simultaneously at T equals 300 and issue origin requests in lockstep. This creates a thundering herd of requests concentrated in a few hundred milliseconds window, overwhelming the origin database with 1,000 concurrent queries for identical data that could have been served by a single refresh.
Probabilistic Early Refresh
Probabilistic early refresh solves synchronization by introducing randomized early recomputation: as a cached entry approaches expiration, each access has an increasing probability of triggering refresh before actual TTL expiry. A common formula is probability = (current_time - cache_time) / TTL ^ beta where beta parameter (typically 1 to 2) controls aggressiveness. At 80 percent of TTL remaining, probability might be 5 percent; at 95 percent, probability rises to 40 percent. This smears refresh work across time instead of concentrating it at the exact TTL boundary. With 1,000 servers, instead of all refreshing at exactly T equals 300, refreshes spread across T equals 280 to T equals 300, with progressively more servers triggering as expiry approaches.
TTL Jitter
TTL jitter complements probabilistic refresh by varying the TTL itself. Instead of setting TTL to exactly 300 seconds, add random jitter: TTL = 300 +/- random(0 to 60), yielding expiration times spread across 240 to 360 seconds. Even without probabilistic refresh, jitter prevents fleet wide synchronization. Combined, these techniques transform a sharp spike (1,000 servers refreshing in same 100ms window) into a smooth ramp (refreshes spread across 60+ seconds). For a 10,000 RPS key with 1,000 caching instances, instead of 10,000 origin requests at T equals 300, you get approximately 166 requests per second spread over 60 seconds (10,000 divided by 60), a 60 times reduction in peak origin load.
Tuning Parameters
Tuning involves balancing freshness against write amplification. Aggressive early refresh (beta equals 1, high probability near expiry) keeps data fresher but increases background refresh rate and origin load. Conservative settings (beta equals 2, lower probability, 20 percent jitter) reduce write amplification but allow more staleness. In practice, setting jitter to 10 to 20 percent of TTL and using beta equals 1.5 provides good balance: a 300 second TTL becomes 270 to 330 seconds with 10 percent jitter, and probabilistic refresh starts 30 to 60 seconds before expiry, spreading load without excessive early refreshes. This is especially critical during deployments and restarts when many cache instances start fresh simultaneously; jitter ensures they do not all expire and refresh on the same schedule.

💡 Key Takeaways

✓Deterministic 300 second TTL on 1,000 servers causes synchronized expiration: all 1,000 servers refresh simultaneously at T equals 300, creating origin load spike

✓Probabilistic early refresh formula: probability = (current_time - cache_time) / TTL ^ beta, where beta (1 to 2) controls aggressiveness; at 95 percent TTL, probability reaches 40 percent

✓TTL jitter of plus or minus 10 to 20 percent (e.g., 300s becomes 270 to 330s) prevents fleet wide synchronization even without probabilistic logic, essential during rolling restarts

✓Combined effect for 10,000 RPS key with 1,000 instances: transforms 10,000 origin request spike into ~166 QPS ramp over 60 seconds, reducing peak load by 60 times

✓Write amplification tradeoff: aggressive early refresh (beta equals 1) keeps data fresh but increases origin load; conservative settings (beta equals 2, 20 percent jitter) reduce load but increase staleness

📌 Interview Tips

1Microservice with 500 instances caching user sessions: 300 second TTL with 15 percent jitter (255 to 345s range) and beta equals 1.5. During rolling restart, instances come online over 10 minutes with staggered TTLs, preventing synchronized first refresh across fleet.

2Comment cache: 60,000 RPS across 2,000 instances. Without jitter, all instances refresh simultaneously every 5 minutes (120,000 origin requests). With 20 percent jitter and probabilistic refresh (beta equals 1.2), refreshes spread over 90 seconds, reducing peak by 90 times.

3Implementation in Python: ttl = base_ttl * (1 + random.uniform(-0.1, 0.1)) for jitter. On read: age = time.time() - cached_time; refresh_prob = (age / ttl) ** 1.5; if random.random() < refresh_prob: trigger_refresh(). This spreads refreshes across the final 20 to 30 percent of TTL window.

← Back to Cache Stampede Problem Overview