Failure Modes: Attacks and Operational Risks in Anonymization
LINKAGE ATTACKS
Attackers combine anonymized data with external datasets to re-identify individuals. Even with direct identifiers removed, quasi-identifiers (zip + birth date + gender) can match public records. Risk increases over time as more datasets become publicly available.
INFERENCE ATTACKS
Statistical inference reveals sensitive attributes without direct re-identification. If 95% of people in an anonymized group share a disease, attackers infer that attribute with high confidence. ML models may also leak information through membership inference.
OPERATIONAL FAILURES
Common risks: incomplete PII detection leaving identifiers in free-text or metadata, version mismatches where non-anonymized data persists in backups, logging systems capturing raw PII. Audit trails must themselves be anonymized.
DEFENSE STRATEGIES
Test with simulated re-identification attacks before release. Use differential privacy for quantifiable guarantees. Implement data minimization. Monitor for emerging auxiliary datasets enabling future attacks.