Privacy & Fairness in ML • Federated LearningMedium⏱️ ~3 min
Communication Efficiency and Compression
Communication is the bottleneck in federated learning, especially cross device deployments on mobile networks with asymmetric uplink speeds of 1 to 10 Mbps. A full model for mobile keyboard prediction is a few megabytes, and sending it every round from thousands of clients would consume gigabytes of bandwidth per round. Practical systems target uplink payloads of 100 KB to 1 MB per client through aggressive compression of model updates.
Quantization reduces precision from 32 bit floating point to 8 bit integers, cutting payload size by 4x with minimal accuracy loss under 1 percent. Sparsification sends only the top k percent of gradient coordinates by magnitude, typically 1 to 10 percent, reducing size by 10x to 100x. Structured updates send only selected layers or parameter groups, for example updating only the final classification head while freezing the feature extractor. Google reports practical payloads of 0.1 to 2 MB per client for Gboard after combining quantization and sparsification.
The tradeoff is convergence speed versus bandwidth. Aggressive sparsification to 1 percent of coordinates can increase rounds to convergence by 50 to 100 percent because important gradient information is discarded. Error feedback accumulates the residual from quantization or sparsification and adds it to the next round update, recovering most of the lost convergence speed at the cost of client side memory for the error accumulator. Gradient compression libraries like TernGrad and QSGD formalize these techniques with provable convergence bounds under certain smoothness assumptions.
Cross silo FL has less bandwidth pressure due to 1 to 10 Gbps datacenter links, so compression is optional. However, it still helps when model size reaches hundreds of megabytes or when training across continents with higher latency. Systems often prefer partial synchronization: send full updates every N rounds and compressed updates in between, balancing communication cost with convergence quality.
💡 Key Takeaways
•Cross device FL targets 100 KB to 1 MB uplink payloads on mobile networks with 1 to 10 Mbps uplink, compressing from multi megabyte models through quantization and sparsification
•Quantization to 8 bit reduces payload by 4x with under 1 percent accuracy loss, while sparsification to top 1 to 10 percent of coordinates reduces by 10x to 100x
•Aggressive sparsification to 1 percent can increase rounds to convergence by 50 to 100 percent because gradient information is discarded, slowing optimization
•Error feedback accumulates quantization residuals and adds them to the next update, recovering convergence speed at the cost of client side memory for the error accumulator
•Google Gboard achieves 0.1 to 2 MB payloads per client through combined quantization and sparsification, enabling practical cross device training over cellular networks
•Cross silo FL with 1 to 10 Gbps links has less bandwidth pressure, but compression still helps for hundreds of megabyte models or intercontinental training with high latency
📌 Examples
A keyboard model with 5 million parameters at 32 bit float is 20 MB uncompressed. After 8 bit quantization and top 10 percent sparsification, the update is 200 KB, fitting in a single mobile uplink burst
TernGrad quantizes gradients to ternary values {negative 1, 0, positive 1} with scaling factors, achieving 16x compression with provable convergence guarantees under smooth loss functionsCross silo federated learning for medical imaging sends full 50 MB model updates every 10 rounds and 5 MB compressed updates in between, balancing 1 Gbps link utilization with convergence quality