Secure Aggregation and Privacy Mechanisms
Why Weight Updates Leak Information
Even though federated learning only shares model updates, these updates can reveal sensitive information. If a client update causes the model to recognize a rare disease, an attacker might infer the client has that disease. Research shows gradient updates can be mathematically inverted to reconstruct training images. Without protection, the server could extract private information from individual updates.
How Secure Aggregation Works
The core idea uses pairwise masking. Before sending updates, each client pair agrees on a random mask. Client A adds the mask; Client B subtracts it. When the server sums all updates, masks cancel out, revealing only the true aggregate. With 10,000 clients, coordination is complex. Production systems use threshold cryptography where only a subset (1,000 of 10,000) need to participate, handling dropouts gracefully.
Differential Privacy as Additional Protection
Secure aggregation hides individual updates, but the aggregate itself still reveals information. If the aggregate shifts dramatically when one client joins, you can infer that client had unusual data. Differential privacy adds calibrated noise. The privacy budget (epsilon) controls noise level: epsilon 1 provides strong privacy but degrades accuracy 5-15%; epsilon 8 preserves accuracy but offers weaker guarantees. Production systems target epsilon 2-6.