Pseudonymization vs Anonymization vs Differential Privacy
PSEUDONYMIZATION: REVERSIBLE TRANSFORMATION
Replaces identifiers with consistent tokens while maintaining a secure mapping table. The same person always gets the same token, enabling longitudinal analysis. Under GDPR, pseudonymized data is still personal data requiring compliance, but provides security: if the dataset is breached, identities stay protected as long as the mapping is secure.
ANONYMIZATION: IRREVERSIBLE REMOVAL
Permanently removes re-identification ability using k-anonymity, generalization, or suppression. Under GDPR, properly anonymized data is not personal data and falls outside regulatory scope. The challenge: proving data is truly anonymous is difficult, and auxiliary data can enable re-identification attacks years later.
DIFFERENTIAL PRIVACY: MATHEMATICAL GUARANTEES
Adds calibrated noise to queries or training with mathematical bounds on privacy loss (epsilon). Unlike k-anonymity, protects against adversaries with arbitrary auxiliary information. For ML, DP-SGD clips gradients and adds noise during backpropagation, preventing models from memorizing training examples.