Privacy & Fairness in MLRegulatory Compliance (GDPR, CCPA)Hard⏱️ ~2 min

Dangerous Failure Modes in Privacy Compliance

Definition
Privacy compliance failures occur when systems believe they are compliant but actually retain or expose personal data—often discovered during audits with severe consequences.

SHADOW COPIES IN UNEXPECTED PLACES

Data exists in more places than tracked: logs, snapshots, backups, CDN caches, search indexes, queues. Deletion from primary database leaves copies in many other locations. Audit all data flows—not just obvious ones.

MODEL MEMORIZATION

Large models can memorize training examples verbatim. A language model may output user emails when prompted. Even after deleting source data, models retain it. Detection: membership inference attacks. Mitigation: differential privacy, output filtering.

💡 Key Insight: Model memorization is real. Researchers extracted credit cards and phone numbers from production models. Treat models as potential data stores requiring compliance.

IDENTIFIER MISMATCH

User requests deletion by email, but ML uses internal user_id. DSAR orchestrator cannot map—deletion fails silently. Systems report success while data remains. Solution: universal identity graph linking all identifiers.

THIRD-PARTY VENDOR GAPS

Data shared with vendors does not get deleted. GDPR holds you responsible for processor compliance. Require contractual deletion SLAs and track lineage across organizational boundaries.

⚠️ Key Trade-off: Comprehensive compliance requires auditing every touchpoint—expensive and complex. Prioritize high-risk areas: models, databases, major vendors.
💡 Key Takeaways
Shadow copies exist in logs, backups, caches, queues—audit all flows
Models memorize training data; run membership inference to detect
Identifier mismatches cause silent deletion failures
📌 Interview Tips
1Audit all locations: logs, backups, CDN, indexes, queues
2Mention memorization—researchers extracted PII from production models
← Back to Regulatory Compliance (GDPR, CCPA) Overview