CachingCDN CachingMedium⏱️ ~3 min

Hierarchical Caching and Origin Shield Architecture

Modern CDNs implement hierarchical caching with multiple tiers to maximize hit ratios and protect origins from request storms. In a typical two-tier design, hundreds of edge Points of Presence (PoPs) serve end users, but cache misses do not go directly to the origin. Instead, they route through a much smaller set of regional parent caches (called Origin Shield at AWS, or mid-tier at other providers). When an edge cache misses, it fetches from its assigned parent; only if the parent also misses does a request reach the origin. This architecture dramatically raises effective hit ratio because a popular object fetched once into a regional parent can then populate dozens of downstream edges without additional origin requests. The mathematical benefit is substantial. Consider a video segment requested by users across 200 edge PoPs worldwide. Without a parent tier, the first request in each PoP generates 200 origin fetches. With a regional shield tier (perhaps 10 to 15 global locations), only the shield locations generate origin fetches, reducing origin load by 90% to 95%. Amazon CloudFront customers commonly report 50% to 70% reductions in origin requests after enabling Origin Shield. Parents also implement request coalescing: when multiple edges simultaneously miss the same object, the parent collapses these into a single origin fetch, with all waiting requests receiving the response once it arrives. This prevents cache stampedes where thousands of concurrent misses overwhelm the origin. Hierarchical caching introduces tradeoffs. Adding a parent tier adds one network hop, typically 10 to 30 milliseconds of latency on a cache miss. However, misses become far less frequent (effective hit ratio often increases from 70% to 85% or higher), so the overall user experience improves. Parents require substantial storage and bandwidth capacity since they aggregate traffic from many edges, but this cost is typically offset by reduced origin egress charges and improved resilience. Netflix Open Connect deploys cache appliances directly inside Internet Service Provider (ISP) networks, achieving a single-tier model with 95%+ traffic served without leaving the ISP, but this requires physical deployment agreements and works best for high-volume content providers.
💡 Key Takeaways
Regional parent caches (Origin Shield) aggregate misses from 10 to 20 edge PoPs, reducing origin requests by 50% to 70% in typical CloudFront deployments and up to 90% in high-fan-out scenarios
Request coalescing at the parent tier collapses concurrent misses for the same cache key into a single origin fetch, preventing stampede scenarios where thousands of requests overwhelm the origin
Adding a parent tier introduces 10 to 30 milliseconds additional latency on cache misses but raises effective hit ratio from 70% to 85%+ by serving more requests from the parent without origin contact
Netflix Open Connect achieves 95%+ of traffic served from ISP-embedded appliances, eliminating both edge and parent tiers by pre-positioning content, but requires physical deployment partnerships
Parents require significantly more storage and bandwidth capacity (aggregating traffic from dozens of edges) but reduce origin egress costs which often exceed parent infrastructure costs at scale
Hierarchical caching is most beneficial for long-tail content with global distribution; highly localized or ultra-popular content may not see proportional gains due to edge cache saturation
📌 Examples
Amazon CloudFront Origin Shield reduces origin requests by 60% for a media site serving globally distributed video segments; a 200-edge miss storm generates only 12 origin fetches (one per shield region)
A gaming company using hierarchical CDN for a 50 GB update launch: 150 edge PoPs generate 150 first-request misses, but with 10 regional parents, only 10 origin fetches occur, reducing origin bandwidth from 7.5 TB to 500 GB
Request coalescing example: 500 concurrent edge requests for a newly published article arrive at a parent within 100 milliseconds; parent fetches once from origin (200 ms round-trip time or RTT) and distributes to all 500 edges, versus 500 simultaneous origin connections
← Back to CDN Caching Overview