Networking & Protocols • Streaming Protocols (HLS, DASH, RTMP)Medium⏱️ ~3 min
What Are the Trade-offs Between Low Latency and Scale?
Reducing latency in segmented streaming comes with significant trade-offs in cacheability, infrastructure load, and cost. Traditional HLS and DASH with 4 to 6 second segments achieve excellent CDN cache hit rates (often exceeding 95 percent) because segments are immutable once published and can be cached for hours or days. Low Latency modes using 200 to 500 millisecond partial chunks dramatically reduce cache effectiveness because each chunk has a much shorter lifetime before being superseded. This increases cache misses, forcing more requests back to the origin and increasing both origin load and CDN egress costs.
The request rate multiplier is substantial. A traditional stream with 4 second segments generates approximately 0.25 requests per second per viewer for segments. A Low Latency stream with 500 millisecond partial chunks generates 2 requests per second per viewer, an 8x increase. For 100,000 concurrent viewers, this jumps from 25,000 RPS to 200,000 RPS just for media segments, not counting manifest updates. Origin servers must be dimensioned accordingly, typically requiring multi tier origins with request coalescing and cache shielding to avoid origin overload. The manifest update frequency also increases, as players must poll more frequently to discover new partial chunks.
Ultra low latency under 1 second typically requires WebRTC, which uses UDP based Real Time Protocol (RTP) for sub second delivery but sacrifices CDN cacheability entirely. Each viewer connection goes directly to a media server, limiting scale and increasing infrastructure costs significantly. A pragmatic middle ground is Low Latency HLS or DASH at 2 to 5 seconds for chat interactive broadcasts like auctions or gaming streams, where some latency is acceptable but traditional 20 to 30 second delays break the interactive experience. The choice depends on your use case: use traditional segmented streaming for pure broadcast at maximum scale and minimum cost, LL-HLS or LL-DASH for interactive streams where 2 to 5 seconds suffices, and WebRTC only when sub second latency is mandatory and you can absorb the infrastructure complexity and cost.
💡 Key Takeaways
•Low Latency modes reduce segment size from 4 to 6 seconds to 0.2 to 0.5 seconds, increasing request rate by 8 to 30x per viewer and reducing cache hit rates
•A 100,000 viewer stream generates 25,000 segment RPS with traditional 4 second segments versus 200,000 RPS with 500 millisecond partial chunks
•Ultra low latency under 1 second requires WebRTC, which eliminates CDN caching and sends each viewer connection directly to media servers, dramatically increasing infrastructure costs
•Low Latency HLS and DASH require HTTP chunked transfer support at every hop (CDN, proxies); intermediate buffering destroys latency targets
•Traditional segmented streaming achieves over 95 percent CDN cache hit rates; Low Latency modes typically see 60 to 80 percent due to shorter chunk lifetimes
•Cost example: Delivering 1 million viewers at 3 Mbps for 2 hours costs $54,000 to $216,000 in bandwidth; Low Latency modes can increase this by 20 to 40 percent due to reduced cache efficiency
📌 Examples
Amazon IVS abstracts Low Latency HLS complexity and achieves 2 to 5 second end to end latency globally by managing the entire pipeline (ingest, transcode, packaging, multi CDN delivery) as a managed service
Video conferencing platforms like Google Meet and Zoom use WebRTC for sub second latency but limit group call sizes (typically 50 to 300 participants) due to server fan out constraints, unlike broadcast streaming which scales to millions
Twitch offers both standard latency (10 to 20 seconds) with maximum cache efficiency and Low Latency mode (2 to 5 seconds) with higher origin load; streamers choose based on whether chat interactivity justifies the infrastructure cost