Privacy & Fairness in MLDifferential PrivacyMedium⏱️ ~2 min

Central vs Local Differential Privacy Trade-offs

Central Differential Privacy (DP) assumes a trusted curator collects raw data, aggregates it, and adds calibrated noise before releasing results. The curator sees individual records but promises not to leak them. Local Differential Privacy (LDP) flips this model: each user adds noise to their own data on device before sending it anywhere, eliminating the need to trust any central party. The fundamental tradeoff is utility versus trust. The utility gap is enormous. With epsilon 1 and 10,000 users, central DP count error is approximately 1 (Laplace noise with scale 1). For the same epsilon in local DP using randomized response on a binary attribute, the standard deviation scales as the square root of n divided by epsilon, giving error around 100. That is two orders of magnitude worse for the same privacy parameter. Apple uses local DP for on device emoji and new word collection with daily budget resets and per signal budgets in single digits, accepting higher noise to avoid collecting raw keystroke data. Google deployed local DP at Chrome scale for telemetry with their RAPPOR protocol, collecting metrics from hundreds of millions of clients without seeing true values. The system used permanent and instantaneous randomized response to learn population statistics like feature adoption rates. In contrast, central DP systems like the US Census or LinkedIn analytics achieve much tighter error bounds by trusting the aggregator, running DP mechanisms on precise aggregates in a secure data center.
💡 Key Takeaways
Utility difference is dramatic: central DP with epsilon 1 on 10,000 users gives error around 1, while local DP gives error around 100 for the same privacy level. That is 100x worse relative accuracy.
Local DP eliminates the trusted curator: Apple uses it for on device telemetry with daily privacy budget resets, collecting emoji usage and new words without seeing raw keystrokes. Google RAPPOR collected Chrome metrics from hundreds of millions of clients.
Central DP scales better with complex queries: histograms with 1,000 bins, cross tabulations, and ML training are practical centrally but require massive populations or very weak privacy locally.
Malicious clients are a local DP risk: adversaries can send crafted noisy reports to bias aggregates. Mitigate with secure aggregation, per device rate limits, and robust estimation with outlier detection.
Hybrid approaches exist: secure aggregation with cryptographic guarantees lets the server see only aggregate sums, combining local trust with central utility. Used in federated learning at Google and Apple.
📌 Examples
Apple on device local DP: daily budget reset, per signal epsilon in single digits, collects emoji frequency and new word suggestions from millions of devices without raw text
Google Chrome RAPPOR: local DP telemetry at hundreds of millions of clients, randomized response on feature flags and configuration adoption
US Census central DP: epsilon 19.61 for 300+ million people, error measured in tens or hundreds for small geography cells, would be infeasible with local DP
← Back to Differential Privacy Overview