Observability and Capacity Planning for Custom UDP Transports

Operating UDP based transports at scale requires building observability that replaces the rich metrics kernel TCP provides. Kernel TCP exposes retransmit counts, congestion window evolution, SYN backlog, and per connection state via netstat, ss, and eBPF tracing. User space UDP stacks must instrument equivalent metrics: track RTT at p50, p95, and p99 percentiles to detect latency degradation; measure loss rate split by random versus bursty loss to tune forward error correction; calculate reordering percentage and maximum reordering distance to set loss detection thresholds; quantify per stream head of line blocking time to validate that stream isolation is effective. Instrument handshake success rate and time, including 0 RTT hit rate for resumed connections, to measure session establishment efficiency.

Congestion control and pacing metrics are critical for diagnosing self inflicted loss and unfairness. Export send rate versus achieved goodput to detect when congestion control is throttling you; track pacing queue depth and underruns or overruns to tune your pacer; measure standing queue delay to ensure you are not persistently filling buffers. For real time applications, log forward error correction overhead and recovery rate (how often FEC alone recovered loss without retransmission), retransmission cause breakdown (timeout versus fast retransmit), and deadline miss rate for time sensitive packets. On mobile clients, instrument battery impact by tracking wakeups per second from timers and keepalives.

Capacity planning must account for packet rate, not just throughput. A service sending 200 byte UDP packets at 1 Gbps generates roughly 600,000 packets per second; the same 1 Gbps of bulk TCP traffic with 1460 byte payloads is only 85,000 packets per second. CPU cost scales with packet rate due to NIC interrupts, system calls, and per packet crypto. Measure CPU per million packets under realistic loss and jitter; load test with packet loss injection at 0.5%, 1%, and 3% to observe retransmission CPU overhead and validate pacing under congestion. Plan for 20 to 50% higher CPU for user space UDP transports compared to kernel TCP at equivalent goodput, and validate with production traffic patterns before scaling.

💡 Key Takeaways

✓User space UDP transports need custom metrics replacing kernel TCP stats: RTT at p50, p95, p99; loss rate split by random versus burst; reordering percentage; per stream head of line time

✓Congestion control observability requires send rate versus goodput, pacing queue depth, pacing underruns and overruns, and standing queue delay targeting one to two RTTs

✓Real time applications must track forward error correction overhead and recovery rate, retransmission cause breakdown, and deadline miss rate for time sensitive packets

✓Capacity planning for UDP must account for packet rate: 200 byte packets at 1 Gbps is 600,000 packets per second versus 85,000 packets per second for 1460 byte TCP payloads at same throughput

✓Load testing should inject packet loss at 0.5%, 1%, and 3% to measure retransmission CPU overhead and validate that pacing prevents congestion collapse

✓Plan for 20 to 50% higher CPU budget for user space UDP transports compared to kernel TCP plus TLS offload at equivalent goodput, validated with realistic traffic

📌 Interview Tips

1Google QUIC deployments export 0 RTT connection hit rate (target above 80% for returning users), handshake latency at p99 (target under 150 ms), and per stream HOL time (target under 50 ms at p99)

2Microsoft Teams tracks UDP send rate versus received goodput; when goodput drops below 70% of send rate, congestion control has likely overreacted or path capacity decreased

3A video conferencing service measures FEC recovery rate: if FEC alone recovers 60% of lost packets without retransmission, the 15% redundancy overhead is justified; if recovery rate drops to 30%, burst loss has increased and FEC percentage should increase to 20%

4Netflix load tests QUIC edge proxies with 1% injected loss and measures CPU per Gbps; finds 35% higher CPU than kernel TCP, then provisions accordingly before rollout

← Back to TCP vs UDP Trade-offs Overview