Networking & Protocols • Streaming Protocols (HLS, DASH, RTMP)Medium⏱️ ~3 min
How Do Segmented HTTP Streaming and Adaptive Bitrate Work?
HLS and DASH split video into time aligned segments (typically 2 to 6 seconds duration) encoded at multiple quality levels called renditions (for example, 160p, 360p, 720p, 1080p, 4K). The player first fetches a manifest file (m3u8 for HLS, Media Presentation Description or MPD for DASH) that lists all available renditions and their segment URLs. The player estimates available bandwidth by measuring download speed and switches between renditions at segment boundaries to match network conditions. This Adaptive Bitrate streaming prevents stalls during congestion and maximizes quality when bandwidth is plentiful.
Latency in segmented streaming is determined by segment duration and buffering strategy. Traditional players buffer 3 segments before starting playback for stability. With 6 second segments, this creates approximately 18 seconds baseline latency; 2 second segments yield approximately 6 seconds baseline. Adding network propagation and jitter typically results in 10 to 30 seconds end to end latency. Low Latency HLS (LL-HLS) and Low Latency DASH introduce partial segments or chunks (200 to 500 milliseconds) published via HTTP chunked transfer encoding, reducing latency to 2 to 5 seconds while preserving CDN compatibility. This requires every hop in the delivery path to support chunked delivery without buffering.
YouTube Live events with several million peak concurrent viewers generate multi terabit per second (Tbps) egress. With manifests polled every 2 to 4 seconds, a 1 million viewer stream generates 500,000 to 1 million requests per second (RPS) for manifests alone if not effectively cached at the CDN edge. Segment requests multiply this by the number of segments per poll interval and rendition count, easily reaching millions of RPS. Proper CDN caching with short manifest Time To Live (TTL) values (1 to 6 seconds for live) and longer segment TTLs (hours to days) is critical for handling this load without overwhelming the origin.
💡 Key Takeaways
•Traditional HLS and DASH with 6 second segments and 3 segment buffer create approximately 18 seconds baseline latency; 2 second segments reduce this to approximately 6 seconds plus network overhead
•Low Latency modes (LL-HLS, LL-DASH) use 200 to 500 millisecond partial chunks with HTTP chunked transfer to achieve 2 to 5 second end to end latency
•A 1 million viewer stream with 2 second manifest polls generates 500,000 RPS for manifests; segment requests add 1 to 2 million additional RPS
•Shorter segments (1 to 2 seconds) improve latency and adaptation granularity but increase request overhead and can worsen ABR stability in volatile networks
•Longer segments (4 to 6 seconds) improve caching efficiency and quality predictability but increase live latency and slow reaction to bandwidth drops
•Common Unified Media Application Format (CMAF) using fragmented MP4 (fMP4) allows a single set of segments to be referenced by both HLS and DASH manifests, halving storage and cache footprint
📌 Examples
Netflix uses 2 to 4 second CMAF segments for DASH (most devices) and HLS (Apple devices), with per title encoding reducing bitrate 20 to 30 percent for the same quality; targets startup delay under 1 to 2 seconds and rebuffer ratio below 0.5 to 1 percent of playtime
Twitch Low Latency mode reduced viewer latency from 10 to 20 seconds down to 2 to 5 seconds using partial segments and shorter target durations, while maintaining scale to millions of concurrent viewers
A 4 hour live event with 6 renditions at 4 second segments generates tens of thousands of individual segment files; using CMAF instead of separate TS (HLS) and MP4 (DASH) formats halves binary storage footprint