Privacy & Fairness in ML • Federated LearningHard⏱️ ~3 min
Secure Aggregation and Privacy Mechanisms
Secure aggregation prevents the coordinator from seeing any individual client update, only their sum, which protects against gradient inversion attacks. Each client masks its update with random pairwise secrets shared with other clients. When enough clients complete, the masks cancel out mathematically, revealing only the aggregated update. Production systems set minimum thresholds like 50 to 200 clients per round to ensure privacy even if some collude.
Differential privacy (DP) adds calibrated noise to bound privacy loss across all rounds. User level DP clips each client gradient to a fixed L2 norm such as 1.0 or 10.0, then adds Gaussian noise scaled to the clipping bound and privacy budget epsilon. Google reports using epsilon values between 2 and 10 for keyboard models. Clipping too aggressively stalls learning, while excessive noise can degrade model quality by 2 to 5 percent. The privacy budget accumulates over rounds, requiring careful accounting to stay within acceptable limits like epsilon less than 10 over the model lifetime.
Combining secure aggregation with differential privacy creates defense in depth. Secure aggregation stops the server from inverting a single client update. Differential privacy bounds worst case leakage even if the server colludes with other clients or observes multiple rounds. Apple enforces both mechanisms for QuickType, with strict device eligibility checks and privacy budget tracking per user. Microsoft SwiftKey and Google Gboard layer compression, secure aggregation, and differential privacy, achieving uplink payloads under 1 MB while maintaining user level epsilon below 10.
The cost is coordination complexity. If too many clients drop before the secure aggregation threshold, the round aborts and privacy budget is wasted. You must over sample invitations, often 5,000 invites targeting 500 completions with a threshold of 200, to handle 80 to 90 percent dropout rates typical on mobile. Secure aggregation adds cryptographic overhead, increasing round time by 10 to 30 percent. Differential privacy noise reduces convergence speed, requiring 20 to 50 percent more rounds to reach target accuracy compared to non private federated learning.
💡 Key Takeaways
•Secure aggregation requires minimum thresholds of 50 to 200 clients to ensure the server cannot recover individual updates even with collusion
•User level differential privacy clips gradients to L2 norm 1.0 to 10.0 and adds Gaussian noise calibrated to epsilon 2 to 10, degrading accuracy by 2 to 5 percent but bounding worst case leakage
•Over sampling is critical: invite 5,000 clients to get 500 completions when targeting a 200 client secure aggregation threshold, handling 80 to 90 percent dropout
•Cryptographic overhead increases round time by 10 to 30 percent, and differential privacy noise can require 20 to 50 percent more rounds to converge
•Privacy budget accumulates across rounds, requiring tracking per user to stay within lifetime epsilon limits like 10 over all training cycles
•Combining both mechanisms provides defense in depth: secure aggregation stops server inversion, differential privacy bounds leakage even with multi round observation
📌 Examples
Google Gboard uses gradient clipping to L2 norm 1.0, adds noise for epsilon between 2 and 10, and applies secure aggregation with thresholds of 100 to 200 clients per round
Apple QuickType enforces strict device eligibility (Wi Fi, charging, idle) and tracks per user privacy budget with secure aggregation thresholds, achieving uplink payloads under 500 KB
A federated medical imaging system clips updates to norm 5.0, adds noise for epsilon 5, and sets a 50 client threshold across hospitals to prevent any single hospital update from being identified