Privacy & Fairness in MLDifferential PrivacyMedium⏱️ ~2 min

Allocating Privacy Budgets and Choosing Epsilon in Production

Choosing epsilon is a policy decision that balances privacy protection, utility, and regulatory or ethical considerations. There is no universal correct value: the US Census used epsilon 19.61 for redistricting statistics, LinkedIn used epsilon 14.4 over three months for labor market insights, and interactive analytics systems often allocate epsilon 0.1 to 1 per query with quarterly budgets of 1 to 10. Smaller epsilon provides stronger privacy but requires more noise, reducing utility. The key is to document choices, simulate utility loss, and review with legal and privacy stakeholders. For interactive systems, allocate a small epsilon per query (0.1 to 1) and enforce a total per user budget per time window. This allows exploration while capping total privacy loss. For batch releases, you can spend a larger epsilon (1 to 20) if the release is infrequent and well audited. Within a fixed total budget, prioritize high value metrics and use techniques like adaptive composition or private selection to allocate epsilon efficiently. For ML training with DP-SGD, typical production settings use epsilon 1 to 10 at delta 1e-6, tuned based on dataset size and acceptable accuracy loss. Budget management requires a ledger that tracks spent epsilon and delta per user cohort and time period. Allocate budgets to teams or products and enforce limits. Consider budget resets: Apple resets local DP budgets daily per device and per signal, allowing continuous telemetry without unbounded accumulation. For central DP with long lived populations, document a maximum lifetime budget and retire data or reset identifiers when exhausted. Always run utility experiments that measure the impact of epsilon on key metrics before deploying to production.
💡 Key Takeaways
No universal epsilon value: US Census used 19.61, LinkedIn 14.4 over three months, interactive systems 0.1 to 1 per query with total quarterly budget 1 to 10. Smaller epsilon means stronger privacy but worse utility.
Interactive vs batch allocation: interactive systems spread small epsilon (0.1 to 1) across many queries, batch systems spend larger epsilon (1 to 20) on infrequent, well audited releases. Prioritize high business value metrics.
DP-SGD for ML typically uses epsilon 1 to 10 at delta 1e-6, tuned based on dataset size and acceptable accuracy drop (2 to 5 percentage points). Larger datasets support smaller epsilon for the same utility.
Budget resets enable continuous use: Apple resets local DP budgets daily per device and per signal. For central DP, consider annual or quarterly resets with documented lifetime caps to avoid unbounded accumulation.
Governance requires a privacy ledger tracking spent epsilon per user cohort and time window. Allocate budgets to teams, enforce limits via API, and review spending regularly. Simulate utility impact before production deployment.
📌 Examples
US Census 2020: epsilon 19.61 total for redistricting, allocated hierarchically across geographic levels and demographic breakdowns, utility tested with 2010 data before deployment
LinkedIn labor insights: epsilon 14.4 over three months (4.8 per month), allocated to salary, hiring, and skills metrics, each query spent epsilon proportional to business priority
Interactive DP dashboard: allocate epsilon 0.5 per query, total budget 5 per user per quarter, reset quarterly, enforce via ledger service that rejects queries when budget exhausted
← Back to Differential Privacy Overview