What Are the Trade-offs Between Low Latency and Scale?
The Fundamental Trade-off
Reducing latency in segmented streaming comes with significant trade-offs in cacheability, infrastructure load, and cost. Traditional HLS/DASH with 4-6 second segments achieve excellent CDN cache hit rates (often 95%+) because segments are immutable once published and can be cached for hours or days. Low Latency modes using 200-500ms partial chunks dramatically reduce cache effectiveness because each chunk has a shorter lifetime before being superseded. This increases cache misses, forcing more requests to origin and increasing both origin load and CDN egress costs.
Request Rate Multiplication
The request rate multiplier is substantial. A traditional stream with 4-second segments generates 0.25 requests per second per viewer for segments. A Low Latency stream with 500ms partial chunks generates 2 requests per second per viewer, an 8x increase. For 100,000 concurrent viewers, this jumps from 25,000 RPS to 200,000 RPS for media segments alone, not counting manifest updates. Origin servers must be dimensioned accordingly, typically requiring multi-tier origins with request coalescing (grouping concurrent requests for the same resource so only one hits origin) and cache shielding (a mid-tier cache layer between edge and origin).
Ultra Low Latency with WebRTC
Ultra low latency under 1 second requires WebRTC, which uses UDP-based RTP (Real-time Transport Protocol) for sub-second delivery. WebRTC sacrifices CDN cacheability entirely: each viewer connection goes directly to a media server, limiting scale and increasing infrastructure costs significantly. A WebRTC media server can handle 1,000-5,000 simultaneous viewers depending on bitrate and server capacity, compared to millions via CDN. This restricts WebRTC to scenarios where sub-second latency is mandatory and you can absorb the infrastructure complexity and cost.
Decision Framework
Traditional HLS/DASH (10-30s latency): Maximum scale, minimum cost, 95%+ cache hit rates. Use for broadcast content where latency is acceptable. LL-HLS/LL-DASH (2-5s latency): Interactive streams (auctions, gaming with chat, Q&A) where some latency is acceptable but 20-30 seconds breaks the experience. Higher infrastructure load, 60-80% cache hit rates. WebRTC (sub-second): Video conferencing, real-time collaboration, millisecond-critical applications. Limited to thousands of viewers, dramatically higher cost per viewer.