Networking & ProtocolsStreaming Protocols (HLS, DASH, RTMP)Medium⏱️ ~3 min

What Are the Trade-offs Between Low Latency and Scale?

The Fundamental Trade-off

Reducing latency in segmented streaming comes with significant trade-offs in cacheability, infrastructure load, and cost. Traditional HLS/DASH with 4-6 second segments achieve excellent CDN cache hit rates (often 95%+) because segments are immutable once published and can be cached for hours or days. Low Latency modes using 200-500ms partial chunks dramatically reduce cache effectiveness because each chunk has a shorter lifetime before being superseded. This increases cache misses, forcing more requests to origin and increasing both origin load and CDN egress costs.

Request Rate Multiplication

The request rate multiplier is substantial. A traditional stream with 4-second segments generates 0.25 requests per second per viewer for segments. A Low Latency stream with 500ms partial chunks generates 2 requests per second per viewer, an 8x increase. For 100,000 concurrent viewers, this jumps from 25,000 RPS to 200,000 RPS for media segments alone, not counting manifest updates. Origin servers must be dimensioned accordingly, typically requiring multi-tier origins with request coalescing (grouping concurrent requests for the same resource so only one hits origin) and cache shielding (a mid-tier cache layer between edge and origin).

Ultra Low Latency with WebRTC

Ultra low latency under 1 second requires WebRTC, which uses UDP-based RTP (Real-time Transport Protocol) for sub-second delivery. WebRTC sacrifices CDN cacheability entirely: each viewer connection goes directly to a media server, limiting scale and increasing infrastructure costs significantly. A WebRTC media server can handle 1,000-5,000 simultaneous viewers depending on bitrate and server capacity, compared to millions via CDN. This restricts WebRTC to scenarios where sub-second latency is mandatory and you can absorb the infrastructure complexity and cost.

Decision Framework

Traditional HLS/DASH (10-30s latency): Maximum scale, minimum cost, 95%+ cache hit rates. Use for broadcast content where latency is acceptable. LL-HLS/LL-DASH (2-5s latency): Interactive streams (auctions, gaming with chat, Q&A) where some latency is acceptable but 20-30 seconds breaks the experience. Higher infrastructure load, 60-80% cache hit rates. WebRTC (sub-second): Video conferencing, real-time collaboration, millisecond-critical applications. Limited to thousands of viewers, dramatically higher cost per viewer.

Key Trade-off: Delivering 1 million viewers at 3 Mbps for 2 hours costs $54,000-$216,000 in bandwidth with traditional streaming. Low Latency modes can increase this by 20-40% due to reduced cache efficiency. WebRTC at this scale would cost orders of magnitude more due to direct server connections.
💡 Key Takeaways
Traditional streaming: 95%+ cache hit rates with 4-6 second segments; Low Latency: 60-80% with 200-500ms chunks
Request rate: 4s segments = 0.25 RPS/viewer; 500ms chunks = 2 RPS/viewer (8x increase); 100K viewers = 25K vs 200K RPS
WebRTC sub-second latency eliminates CDN caching; media servers handle 1,000-5,000 viewers vs millions via CDN
Cost impact: LL modes increase bandwidth cost 20-40% from reduced cache efficiency; WebRTC dramatically higher
📌 Interview Tips
1Compare request rates: 100,000 viewers generate 25,000 RPS with 4s segments vs 200,000 RPS with 500ms chunks
2Explain cache hit rate degradation: traditional 95%+ vs Low Latency 60-80% due to shorter chunk lifetimes
3Present decision framework by latency needs: broadcast (traditional), interactive (LL-HLS), real-time (WebRTC)
← Back to Streaming Protocols (HLS, DASH, RTMP) Overview