Privacy & Fairness in ML • Federated LearningMedium⏱️ ~3 min
What is Federated Learning?
Federated learning trains a single global model across many clients while keeping raw data completely local. Instead of collecting datasets centrally, a coordinator distributes an initial model to eligible clients. Each client trains locally for a few iterations, computes model deltas or gradients, and sends only these updates back. The server aggregates updates into a new global model and repeats for many rounds until convergence.
Two deployment patterns dominate production. Cross device federated learning runs on millions of consumer devices like smartphones with intermittent connectivity, strict resource limits, and short training windows of 2 to 10 minutes per round. Cross silo federated learning runs across a small number of data silos such as 50 hospitals or regional banks with reliable 1 to 10 Gbps networks, tens of GPUs per site, and longer 30 to 120 minute rounds.
Google reports Gboard keyboard improvements of 1 to 3 percent in top 1 recall using cross device federated learning with hundreds to thousands of rounds, each with hundreds to low thousands of participating clients. Round times average 10 to 30 minutes including client selection, training, and network transfer. Uplink payloads after compression are 0.1 to 2 MB per client. Apple uses federated learning for QuickType predictions and system telemetry under differential privacy constraints with Wi Fi only, charging, and screen off requirements.
The fundamental shift is compute happens on the edge, communication is sparse and structured, and privacy is a first class constraint built into the system architecture rather than an afterthought.
💡 Key Takeaways
•Cross device FL targets millions of mobile devices with 2 to 10 minute training windows, 0.1 to 2 MB uplink payloads, and hundreds to low thousands of clients per round
•Cross silo FL runs across tens to hundreds of institutions with 30 to 120 minute rounds, tens of GPUs per site, and reliable 1 to 10 Gbps networks
•Google Gboard achieves 1 to 3 percent improvement in top 1 recall over hundreds to thousands of rounds with 10 to 30 minute round times
•Federated Averaging (FedAvg) aggregates client updates weighted by local data size to produce the next global model each round
•Raw data never leaves client devices, reducing privacy risk, regulatory compliance burden, and uplink bandwidth compared to centralized collection
📌 Examples
Google uses cross device FL for Gboard keyboard predictions, training on keystroke data that stays on device, with rounds running nightly when devices are charging on Wi Fi
Apple deploys federated learning for QuickType autocorrect and system telemetry, enforcing differential privacy with on device resource gating for battery and network
Cross silo example: 50 hospitals train a medical imaging model where each site runs local epochs on private patient scans, sends encrypted updates, and the aggregator produces a shared diagnostic model without centralizing health records