Partitioning & ShardingHotspot Detection & HandlingMedium⏱️ ~2 min

Handling Read Hotspots with Caching, Coalescing, and Replica Fan Out

Read hotspots occur when many clients simultaneously fetch the same data, overwhelming a single database shard, cache node, or origin server. The primary mitigation strategies are edge caching to serve reads from geographically distributed locations closer to clients, request coalescing (also called single flight or collapsed forwarding) to deduplicate concurrent cache misses so only one backend fetch occurs while others wait, and replica fan out to route reads for hot keys to multiple read replicas instead of a single primary. These techniques can reduce backend load by 50 to 100 times during traffic spikes. Edge caching with Content Delivery Networks (CDNs) is most effective for publicly cacheable content. When a single URL becomes globally hot, edge caches store the response at hundreds of Points of Presence (POPs) worldwide, serving requests in single digit milliseconds with zero origin load after the first miss per POP. Request coalescing at the CDN or application tier ensures that when a cache miss occurs, only one request goes to the origin while concurrent requests for the same key wait for that result. This pattern commonly reduces origin traffic by 50 to 100 times during cache stampedes when a popular item expires. The trade off is freshness: short Time To Live (TTL) values like 1 to 5 seconds keep data reasonably fresh for hot objects, combined with conditional revalidation (If Modified Since headers) and background refresh to avoid blocking client requests on revalidation. For data that cannot be cached publicly or has strict freshness requirements, replica fan out spreads read load across multiple database replicas. If a hot key receives 50,000 reads per second and each replica can sustain 10,000 reads per second, routing reads to 5 or more replicas keeps each below saturation. Locality aware hashing can increase virtual nodes for hot partitions in consistent hashing rings, spreading replicas across more physical machines while keeping the ring stable for other keys. The main risk is replication lag: replicas may be seconds or even minutes behind the primary during high write load. Production systems monitor replication lag using metrics like log sequence number differences or commit index offsets, and serve from replicas only if lag is within the freshness SLO (commonly 1 to 5 seconds for social feeds, stricter for financial data).
💡 Key Takeaways
Edge caching with CDNs and request coalescing reduce origin load by 50 to 100 times during traffic spikes by serving from geographically distributed POPs and deduplicating concurrent cache misses
Short TTL values of 1 to 5 seconds for hot objects balance freshness with cache hit rates; conditional revalidation (If Modified Since) and background refresh avoid blocking clients on revalidation
Replica fan out spreads read load across multiple replicas; if a hot key receives 50,000 reads/s and each replica sustains 10,000 reads/s, routing to 5+ replicas prevents saturation
Replication lag is the main risk with replica fan out; monitor lag using log sequence number or commit index differences and serve only if within freshness SLO (commonly 1 to 5 seconds)
Locality aware hashing increases virtual nodes for hot partitions to spread replicas across more machines while keeping consistent hashing stable for cold keys
📌 Examples
A news site's breaking story URL becomes globally hot; CDN edge caches serve 1 million requests/minute from 200 POPs worldwide with 2 second TTL, sending only 500 origin requests total via request coalescing
Reddit's front page caches post lists at CDN edge with 5 second TTL and conditional revalidation; during traffic spikes, cache hit rate exceeds 98% and origin load stays under 1,000 QPS
A social media platform routes read requests for a celebrity profile (receiving 100,000 reads/s) to 10 geographically distributed read replicas, keeping each replica under 10,000 reads/s; replication lag monitoring ensures replicas within 2 seconds of primary
← Back to Hotspot Detection & Handling Overview