What is Differential Privacy?
Differential Privacy: A mathematical framework guaranteeing that the output of a computation does not reveal whether any individual was in the input dataset. Adding or removing one person changes the output only slightly, making it impossible to determine if someone participated by observing results.
The Privacy Problem
Publishing aggregate statistics can leak individual information. A database shows "average salary of employees in department X is 150,000 USD." If you know there are 5 employees and later learn the average dropped to 140,000 USD after someone left, you can infer that person earned 190,000 USD. Even ML models memorize training data: language models reproduce verbatim text from training, image classifiers reveal whether specific images were used. Differential privacy prevents these inference attacks mathematically, not through policy.
The Epsilon Parameter
Privacy guarantee is controlled by epsilon (ε). Lower epsilon means stronger privacy: ε=0.1 provides strong protection, ε=1 provides moderate protection, ε=10 provides weak protection. Mathematically, epsilon bounds how much adding or removing one record can change output probabilities. If ε=1, any output is at most e^1 ≈ 2.7 times more likely with versus without a specific record. The trade-off: lower epsilon requires more noise, reducing data utility. Choosing epsilon is a policy decision balancing privacy risk against analytical value.
Noise Mechanisms
Differential privacy is achieved by adding calibrated random noise to outputs. Laplace mechanism: Adds noise scaled to query sensitivity (how much one record can change the answer). Used for numerical queries (counts, sums, averages). Gaussian mechanism: Adds Gaussian noise, slightly weaker guarantees but better composition properties. Exponential mechanism: For categorical outputs, selects options with probability exponentially weighted by quality score. The noise makes outputs fuzzy enough that individual records cannot be reverse-engineered.
Key Insight: Differential privacy provides provable guarantees regardless of attacker knowledge or computational power—unlike encryption which can be broken with sufficient resources.