Secure Aggregation and Privacy Mechanisms

Core Concept
Secure Aggregation is a cryptographic protocol allowing a server to compute the sum of client updates without seeing individual updates. The server learns only the final aggregate.
Why Weight Updates Leak Information
Even though federated learning only shares model updates, these updates can reveal sensitive information. If a client update causes the model to recognize a rare disease, an attacker might infer the client has that disease. Research shows gradient updates can be mathematically inverted to reconstruct training images. Without protection, the server could extract private information from individual updates.
How Secure Aggregation Works
The core idea uses pairwise masking. Before sending updates, each client pair agrees on a random mask. Client A adds the mask; Client B subtracts it. When the server sums all updates, masks cancel out, revealing only the true aggregate. With 10,000 clients, coordination is complex. Production systems use threshold cryptography where only a subset (1,000 of 10,000) need to participate, handling dropouts gracefully.
Differential Privacy as Additional Protection
Secure aggregation hides individual updates, but the aggregate itself still reveals information. If the aggregate shifts dramatically when one client joins, you can infer that client had unusual data. Differential privacy adds calibrated noise. The privacy budget (epsilon) controls noise level: epsilon 1 provides strong privacy but degrades accuracy 5-15%; epsilon 8 preserves accuracy but offers weaker guarantees. Production systems target epsilon 2-6.
⚠️ Key Trade-off: Privacy mechanisms reduce model quality. Secure aggregation adds 2-5x communication overhead. Differential privacy with epsilon 3 typically reduces accuracy by 3-8%.

💡 Key Takeaways

✓Weight updates can leak sensitive information and be mathematically inverted to reconstruct training data

✓Secure aggregation uses pairwise masking where random masks cancel out when summed

✓Threshold cryptography allows aggregation when 10-30% of clients drop out mid-round

✓Differential privacy adds noise with epsilon controlling privacy-utility trade-off (2-6 typical)

✓Privacy has real costs: secure aggregation adds 2-5x overhead, differential privacy costs 3-8% accuracy

📌 Interview Tips

1Explain that privacy has concrete costs: secure aggregation adds 2-5x communication overhead

2When discussing epsilon, give examples: epsilon 1 is strong privacy with 5-15% accuracy loss

← Back to Federated Learning Overview