Networking & Protocols • TCP vs UDP Trade-offsHard⏱️ ~3 min
Reliability Models and Congestion Control in UDP Based Transports
Building a production quality UDP based transport requires implementing reliability, congestion control, and pacing mechanisms that TCP provides automatically. The first decision is the reliability model: map each message type to delivery semantics. Frequent state updates like player positions in games can be unreliable and unordered; newer updates supersede old ones. Critical events like item pickups or configuration changes need reliable, ordered delivery within a stream. Independent resources like video chunks can be reliable but unordered across streams. Selective retransmission with deadlines is key: assign each message a playout or processing deadline, drop retransmissions that would arrive too late, and prioritize new data over stale retransmissions.
Congestion control is essential to be a good network citizen and avoid self inflicted packet loss. Without pacing, UDP applications can generate microbursts that exceed switch buffer capacity, causing loss rates to jump from 0.5% baseline to 5 to 10% and triggering incast collapse. Implement a paced sender that maintains a budget based on congestion window or rate estimate; send packets at intervals to smooth traffic. Choose between loss based algorithms (additive increase multiplicative decrease, reducing rate by 50% on loss) and model based or rate based algorithms that estimate bottleneck bandwidth and target a standing queue of one to two RTTs. Google BBR is a rate based algorithm that aims to keep queuing delay low while maximizing throughput; it is effective on paths with variable buffering but requires careful tuning to coexist fairly with loss based flows.
Loss recovery must handle both random and bursty loss patterns. Wi-Fi and cellular commonly exhibit burst loss where multiple consecutive packets drop. Use sequence numbers per stream, acknowledge with ranges (similar to TCP SACK), and detect loss via both time thresholds (timeout after 1.5 to 2 times smoothed RTT) and packet count thresholds (three out of order packets). Allow limited reordering tolerance; some paths reorder packets by several positions, and declaring loss too early triggers spurious retransmissions. Forward error correction adds 10 to 20% parity packets per block; dynamically adjust redundancy based on recent burst length distribution. Interleave packets across time to spread the impact of contiguous loss. For real time media, combine FEC with adaptive jitter buffers of 50 to 200 ms and packet loss concealment; retransmit only if the packet is expected to arrive before the playout deadline.
💡 Key Takeaways
•Selective reliability maps message types to semantics: unreliable unordered for frequent updates, reliable ordered for critical events, reliable unordered for independent resources like video chunks
•Paced sending prevents microbursts that can spike loss from 0.5% baseline to 5 to 10% and cause network collapse; maintain a send budget and smooth packet departure intervals
•Loss based congestion control reduces rate by 50% on loss; model based algorithms like Google BBR estimate bottleneck bandwidth and target one to two RTT standing queue delay
•Bursty loss on Wi-Fi and cellular requires burst length tracking; dynamically adjust forward error correction from 10 to 20% redundancy based on observed burst patterns
•Loss detection uses both time thresholds (1.5 to 2 times smoothed RTT) and packet count thresholds (three out of order packets), with reordering tolerance to avoid spurious retransmits
•Real time media combines FEC, adaptive jitter buffers of 50 to 200 ms, and deadline based retransmission; drop retransmissions that would miss playout time and prioritize new frames
📌 Examples
QUIC implements per stream sequence numbers and SACK style ACKs with packet and time loss detection, allowing independent recovery without head of line blocking across streams
WebRTC uses Opus audio with in band FEC (10 to 15% redundancy) and adaptive jitter buffer targeting 20 to 150 ms based on network jitter; conceals loss with packet loss concealment when retransmit would arrive late
Riot Games League of Legends uses delta compression and selective reliability: movement packets are unreliable, ability casts are reliably delivered with sequence numbers, avoiding the cost of full TCP reliability
A video conferencing system detects burst loss averaging three packets; increases FEC to 20% for the next second, then decays back to 10% baseline as loss subsides