Data Governance & LineageGDPR & Data Privacy ComplianceEasy⏱️ ~3 min

What is GDPR & Data Privacy Compliance?

Definition
General Data Protection Regulation (GDPR) is European Union law requiring companies to protect personal data and give individuals control over their information, including rights to access, correct, and delete their data.
The Core Problem: Modern data platforms collect millions of events per second: clickstreams, payments, location data, behavioral logs. Without constraints, this personal data is easy to copy, hard to delete, and trivial to misuse. A user's email might exist in production databases, log files, analytics warehouses, machine learning models, cache systems, backup archives, and analyst laptops. When that user requests deletion, how do you find and remove all copies? GDPR treats personal data as toxic material that must be tightly controlled, tracked through its lifecycle, and eventually destroyed. It transforms what was primarily a product and legal problem into a fundamental data engineering challenge. Key Definitions: Personal data means any information that can identify a person directly or indirectly. This includes obvious identifiers like email addresses and phone numbers, but also combinations like IP address plus timestamp plus device identifier that together can uniquely identify someone. Data controller decides why and how personal data is processed (typically product teams), while data processor handles data on behalf of the controller (infrastructure teams, cloud services). Data subject rights give individuals specific powers: the right to access their data, correct inaccuracies, request deletion (right to be forgotten), restrict processing, port data to another service, and object to certain uses. Privacy by design requires embedding privacy protections into system architecture from the start, not bolting them on later. This means data minimization (collect only what you need), isolation of personally identifiable information (PII), pseudonymization (replacing identifiers with tokens), and strict retention limits. From a data engineering perspective, GDPR is primarily about lifecycle control and purpose limitation across distributed systems. Every piece of personal data must have a lawful basis for processing (such as explicit consent or legitimate business interest), and you must be able to prove you handle it correctly throughout its entire lifecycle.
💡 Key Takeaways
GDPR requires companies to protect personal data and give users rights to access, correct, and delete their information across all systems
Personal data includes not just obvious identifiers like email, but also combinations of fields (IP plus timestamp plus device) that together can identify someone
Data controller decides how data is used (product teams) while data processor handles it (infrastructure teams, cloud providers)
Data subject rights translate to technical requirements: locate all user data across distributed systems and delete or export within strict Service Level Agreements (SLAs)
Privacy by design means architectural choices like data minimization, PII isolation, pseudonymization, and retention limits from the start
📌 Examples
1A user requests deletion. You must find and remove their email from production databases, analytics warehouses, log archives, ML models, cache systems, and backup tapes within 30 days
2An IP address (192.168.1.1) plus timestamp (2024-01-15 14:32) plus device ID (iPhone12) together can uniquely identify a person, making all three fields personal data under GDPR
3Product team (controller) decides to collect location data for restaurant recommendations. Infrastructure team (processor) stores and processes this data following the controller's instructions
← Back to GDPR & Data Privacy Compliance Overview
What is GDPR & Data Privacy Compliance? | GDPR & Data Privacy Compliance - System Overflow