Privacy & Fairness in ML • Regulatory Compliance (GDPR, CCPA)Medium⏱️ ~2 min
Four Planes of Compliant ML Architecture
A compliant ML system requires four interconnected planes working together. The consent and policy plane captures the lawful basis for each subject and purpose, storing consent and purpose tags in a low latency consent store. This store must deliver read latencies under 10 milliseconds at p99 because inference services check gating on every request. The data inventory and lineage plane maps where personal data flows across your entire infrastructure.
Use automated scanners and metadata harvesting to catalog tens of thousands of tables and streams. For a typical large scale system, expect to track 5,000 to 20,000 datasets across a 10 petabyte data lake, tagging columns and feature definitions as personal, sensitive, or derived. The training and unlearning plane enforces purpose limitation and deletion rights. When users delete their data, a Data Subject Access Request (DSAR) orchestrator fans out delete commands to online stores, cold storage, training corpora, and model artifacts.
In large consumer applications, expect 1,000 to 10,000 DSARs per day. Design your orchestrator to push 10 to 50 requests per second sustained, burst to 200 per second, and track completion per system. Realistic propagation times are minutes for online stores, 24 hours for warehouse tables, and weekly cycles for model retraining. The inference and audit plane enforces runtime privacy and produces evidence. At inference time, fetch consent and policy for the subject and purpose, filter features that violate purpose or opt out status, and log every decision path with immutable audit records kept for 12 to 24 months.
💡 Key Takeaways
•Consent store must deliver under 10 milliseconds p99 read latency because inference services check gating on every single request
•Automated scanners catalog 5,000 to 20,000 datasets across a 10 petabyte data lake, tagging columns as personal, sensitive, or derived
•Large consumer apps handle 1,000 to 10,000 DSARs per day, requiring orchestrators that sustain 10 to 50 requests per second and burst to 200
•Deletion propagation times vary dramatically: minutes for online stores, 24 hours for warehouse tables, weekly cycles for model retraining
•Immutable audit logs must be kept for 12 to 24 months with strict access controls to provide evidence during regulatory investigations
📌 Examples
Google built federated learning for on device model updates that avoid centralizing raw data, keeping personal information on user devices
Microsoft and Amazon operate centralized DSAR systems that orchestrate deletion and access across hundreds of data systems in parallel
Apple applies differential privacy for telemetry to reduce reidentification risk while still collecting aggregate usage statistics