Privacy & Fairness in MLFederated LearningMedium⏱️ ~3 min

Secure Aggregation and Privacy Mechanisms

Core Concept
Secure Aggregation is a cryptographic protocol allowing a server to compute the sum of client updates without seeing individual updates. The server learns only the final aggregate.

Why Weight Updates Leak Information

Even though federated learning only shares model updates, these updates can reveal sensitive information. If a client update causes the model to recognize a rare disease, an attacker might infer the client has that disease. Research shows gradient updates can be mathematically inverted to reconstruct training images. Without protection, the server could extract private information from individual updates.

How Secure Aggregation Works

The core idea uses pairwise masking. Before sending updates, each client pair agrees on a random mask. Client A adds the mask; Client B subtracts it. When the server sums all updates, masks cancel out, revealing only the true aggregate. With 10,000 clients, coordination is complex. Production systems use threshold cryptography where only a subset (1,000 of 10,000) need to participate, handling dropouts gracefully.

Differential Privacy as Additional Protection

Secure aggregation hides individual updates, but the aggregate itself still reveals information. If the aggregate shifts dramatically when one client joins, you can infer that client had unusual data. Differential privacy adds calibrated noise. The privacy budget (epsilon) controls noise level: epsilon 1 provides strong privacy but degrades accuracy 5-15%; epsilon 8 preserves accuracy but offers weaker guarantees. Production systems target epsilon 2-6.

⚠️ Key Trade-off: Privacy mechanisms reduce model quality. Secure aggregation adds 2-5x communication overhead. Differential privacy with epsilon 3 typically reduces accuracy by 3-8%.
💡 Key Takeaways
Weight updates can leak sensitive information and be mathematically inverted to reconstruct training data
Secure aggregation uses pairwise masking where random masks cancel out when summed
Threshold cryptography allows aggregation when 10-30% of clients drop out mid-round
Differential privacy adds noise with epsilon controlling privacy-utility trade-off (2-6 typical)
Privacy has real costs: secure aggregation adds 2-5x overhead, differential privacy costs 3-8% accuracy
📌 Interview Tips
1Explain that privacy has concrete costs: secure aggregation adds 2-5x communication overhead
2When discussing epsilon, give examples: epsilon 1 is strong privacy with 5-15% accuracy loss
← Back to Federated Learning Overview
Secure Aggregation and Privacy Mechanisms | Federated Learning - System Overflow