Privacy & Fairness in ML • Federated LearningHard⏱️ ~3 min
Production Deployment and Failure Modes
Production federated learning systems face operational challenges absent in centralized training. Device dropout is the dominant failure mode: 80 to 90 percent of invited cross device clients fail to complete due to network changes, battery drain, app backgrounding, or user interaction. Secure aggregation requires a minimum threshold like 200 completed clients to reveal the aggregated update. If too many drop, the round aborts, wasting privacy budget and coordinator resources. Over sampling invitations by 5x to 10x mitigates this, for example invite 5,000 clients targeting 500 completions with a 200 client threshold.
Version skew corrupts aggregation when clients run different model architectures or optimizer configurations due to staggered app rollout. If 30 percent of clients aggregate updates for a 10 layer model while 70 percent aggregate for a 12 layer model, the result is garbage. Enforce strict version matching per round and reject mismatched clients at the coordinator. Canary rounds with small cohorts detect bad configurations before full rollout. Google and Apple use phased regional rollout over days to limit blast radius.
Monitoring is blind because raw data never reaches the server. You cannot inspect training samples or compute detailed per example metrics. Federated analytics with secure aggregation computes aggregate statistics like loss distribution, per cohort accuracy, and slice metrics while preserving privacy. Shadow rounds run with a small opt in cohort that reports detailed telemetry for debugging. Holdout clients maintained at the server side with privacy preserving data collection allow offline evaluation.
Poisoning attacks inject malicious updates that degrade model quality or insert backdoors. Under secure aggregation the server cannot see individual updates, making detection harder. Robust aggregation with coordinate wise clipping or median aggregation reduces impact but cannot eliminate it. Auditor cohorts forgo secure aggregation for canary purposes, detecting anomalies before they spread. Client attestation, rate limits per device, and reputation scoring provide additional defenses. Microsoft and Meta report using device trust signals and rate limits to block repeated poisoning attempts.
💡 Key Takeaways
•Device dropout of 80 to 90 percent is typical in cross device FL, requiring 5x to 10x over sampling to meet secure aggregation thresholds of 50 to 200 clients
•Version skew from staggered app rollout corrupts aggregation if clients train incompatible model architectures; enforce strict version matching and reject mismatched clients
•Monitoring is blind without raw data access; use federated analytics with secure aggregation for aggregate metrics, shadow rounds for debugging, and holdout clients for offline evaluation
•Poisoning attacks inject malicious updates under secure aggregation; mitigate with robust aggregation like coordinate wise median, auditor cohorts, and device attestation with rate limits
•Canary rounds with small cohorts detect bad configurations before full rollout; Google and Apple use phased regional rollout over days to limit blast radius
•Training serving skew occurs when on device feature computation differs from training, causing 10 to 20 percent accuracy drop; enforce feature parity and version alignment
📌 Examples
Google Gboard invites 10,000 clients per round expecting 1,000 completions with a 300 client secure aggregation threshold, handling 90 percent dropout from network changes and battery constraints
A federated keyboard model rollout detects version skew when 40 percent of clients send 10 layer updates and 60 percent send 12 layer updates; the coordinator rejects mismatched clients and retries with version locked cohorts
Facebook content moderation uses auditor cohorts that forgo secure aggregation to detect poisoned updates attempting to bypass hate speech filters, blocking devices with repeated anomalies