Privacy & Fairness in ML • Regulatory Compliance (GDPR, CCPA)Hard⏱️ ~2 min
Dangerous Failure Modes in Privacy Compliance
Derived data left behind is one of the most common compliance failures. You delete source rows but forget to purge derived features, embeddings, or precomputed aggregations, leading to continued use of personal data in training and inference. A concrete example: a user requests deletion, raw events are deleted within 24 hours, but their session embeddings remain in a feature store snapshot used for 30 days. Any decisions based on those features violate the deletion request.
Backups and cold archives present another subtle trap. Data copied into snapshots and backups can persist for months. While GDPR allows reasonable backup practices, it expects deletion without undue delay. Without tombstones that prevent restoration of deleted records, backup restores can reanimate deleted data. Third party processors add external dependencies. Data sent to vendors for fraud scoring or enrichment must be covered by processor agreements supporting deletion and access, but vendors with 7 to 30 day service level agreements (SLAs) can cause your overall DSAR to breach statutory limits.
Automated decision making restrictions create operational challenges. GDPR restricts solely automated decisions with legal or similarly significant effects. A model that auto denies loans or suspends accounts without human review can be noncompliant, requiring human in the loop review that adds latency and operational cost. Preference drift and stale policy happen when consent changes are missed in long running jobs. A nightly Extract Transform Load (ETL) job can materialize features from yesterday's opt in list, then serve them to a user who opted out this morning. Without real time policy invalidation, the system leaks personal data.
💡 Key Takeaways
•Derived data left behind is the most common failure: deleting raw events but leaving session embeddings in feature stores for 30 days violates deletion requests
•Backups and cold archives can persist deleted data for months, requiring tombstones that prevent restoration of deleted records during backup restores
•Third party vendors with 7 to 30 day SLAs for deletion can cause your DSAR to breach statutory limits, even if your internal systems comply
•GDPR restricts solely automated decisions with legal effects, requiring human in the loop review for loan denials or account suspensions, adding latency and cost
•Stale policy from nightly ETL jobs can serve features from yesterday's opt in list to users who opted out this morning without real time invalidation
•Data subject reidentification through linkage happens when publishing model explanations or dashboards with cohorts smaller than k anonymity thresholds
📌 Examples
A European fintech deleted user transaction records within 24 hours but kept derived credit score features for 90 days, facing GDPR violations on audit
Cross border transfers from EU to US training clusters without Standard Contractual Clauses and technical safeguards led to GDPR enforcement actions
Publishing aggregate dashboards with cohorts of 5 users in rare geographic regions allowed reidentification through linkage attacks