Data Quality & Validation • Data Contracts & SLAsHard⏱️ ~3 min
Trade-offs: Strict vs Flexible Contracts
The Central Question: Data contracts trade flexibility and speed of change for reliability and predictability. The decision is not whether to have contracts, but how strict to make them and who controls them.
Strong Contracts: When Data is Critical
Strict enforcement with validation that rejects incompatible schemas at ingestion shifts failures left. Producers discover problems in CI pipelines rather than production. This matters for high value use cases: billing systems, financial reporting, or core recommendation models where silent corruption causes revenue loss or regulatory issues.
The trade off is slower producer velocity. A feature team wanting to add multi currency support might need approval from 15 downstream consumers, schema review, and a 90 day deprecation plan. For fast moving products or A/B testing frameworks, this feels heavy. Feature velocity drops from weekly to monthly releases.
Weak Contracts: When Exploration Matters
Best effort SLAs and documentation only contracts give producers autonomy. This works for early stage products, experimental analytics, or low risk reporting where occasional data quality issues are acceptable. Data scientists exploring new features can iterate quickly without coordination overhead.
The cost shows up in incidents and on call burden. Data engineering teams spend more time firefighting. Downstream jobs need defensive logic: checking for nulls, handling schema mismatches, and backfilling when completeness drops. At one company, weak contracts for an exploratory product meant 3 incidents per week. When the product became critical, retrofitting strict contracts took 6 months.
Central Governance vs Data Mesh
Central governance with a data platform team defining standard templates, naming conventions, and SLA tiers enforces consistency. With 500+ producers, common guarantees prevent chaos. However, local decisions slow down. Every schema change needs central approval.
Data mesh style federated ownership lets domain teams own contracts with platform enforcing only minimal requirements: schema registration and observability hooks. This improves autonomy but creates inconsistent SLA quality. One team might offer 99.99 percent availability while another offers best effort.
SLA Cost Reality
Achieving 99.99 percent freshness for real time data requires multi region pipelines, extra replicas, and 24/7 on call coverage. Costs might be 5x higher than 99 percent daily batch SLAs. Tie SLA levels to business value. A fraud detection system justifies 99.99 percent uptime. An internal analytics dashboard might not.
Decision Framework: Choose strong contracts with strict SLAs when data drives revenue, compliance, or customer facing features. Use weak contracts for exploration, internal tools, or non critical analytics. Start weak for new products, tighten as they mature. Define SLA levels based on business impact, not technical perfection.
Strong Contracts
Reject incompatible schemas at ingestion, fail CI on breaking changes, strict SLAs
vs
Weak Contracts
Schemas as documentation, best effort SLAs, producer autonomy
"The decision is not strict versus flexible across the board. It is mapping contract strength to business criticality."
💡 Key Takeaways
✓Strong contracts reduce silent corruption but slow producer velocity: feature releases drop from weekly to monthly with 90 day deprecation requirements
✓Weak contracts enable fast iteration for exploration but create 3+ incidents per week and require defensive downstream logic
✓Central governance ensures consistency across 500+ producers but requires approval for every schema change; data mesh improves autonomy but creates inconsistent SLA quality
✓Achieving 99.99% freshness costs 5x more than 99% daily SLAs due to multi region infrastructure, extra replicas, and 24/7 on call coverage
✓Decision framework: Map contract strength to business criticality. Start weak for new products, tighten as they mature and impact revenue or compliance
📌 Examples
1Strong contract use case: Billing system at ecommerce company processes 5,000 writes/sec with zero tolerance for data loss, requires strict schema validation and 99.99% availability
2Weak contract scenario: Early stage product used best effort SLAs, caused 3 incidents per week; retrofitting strict contracts after product became critical took 6 months
3Cost comparison: Real time fraud detection with 99.99% SLA needs multi region setup costing 5x more than batch analytics with 99% daily SLA