Data Contracts and Expectation Based Monitoring

WHAT ARE DATA CONTRACTS
A data contract is a formal specification of what data should look like. It defines expected schema (column names, types), value constraints (ranges, allowed categories), freshness requirements, and statistical properties (expected distributions, correlations).
Contracts make implicit assumptions explicit. Instead of hoping data is correct, you define what correct means and validate against it. When contracts are violated, you get alerts before bad data reaches the model.
EXPECTATION-BASED MONITORING
Expectations are specific testable conditions. Examples: user_age BETWEEN 0 AND 150, null_rate(email) < 0.01, unique_count(user_id) > 100000.
Tools like Great Expectations and dbt tests encode expectations as code. Each expectation runs against incoming data. Failures trigger alerts or block pipelines.
BUILDING EFFECTIVE CONTRACTS
Start from training data: Profile your training data. What were the value ranges? Null rates? Cardinalities? Use these as baseline expectations.
Add domain knowledge: Some constraints are not in training data but are logical. User ages cannot be negative. Prices cannot be more than $1M for most products.
Allow for expected variation: Do not set constraints too tight. A 5% null rate that varies between 4-6% does not need alerts. Set thresholds with buffer for normal variation.
CONTRACT EVOLUTION
Contracts are not static. As products evolve, expectations change. New categories appear. Value ranges expand. Review and update contracts quarterly or when major changes ship.
Version contracts alongside data schemas. When schema changes, update contracts. Maintain contract history for debugging (what were expectations when this bug occurred?).
✅ Best Practice: Treat data contracts as code. Store in version control. Review in PRs. Test in CI. Make contract violations block deployments for critical pipelines.

💡 Key Takeaways

✓Data contracts define expected schema, value constraints, freshness, and statistical properties—making implicit assumptions explicit

✓Expectations are testable conditions: value ranges, null rates, cardinalities; tools like Great Expectations automate validation

✓Start from training data profile, add domain constraints, allow buffer for normal variation; version contracts with schemas

📌 Interview Tips

1Interview Tip: Give specific expectation examples: age range, null rate threshold, unique count minimum.

2Interview Tip: Explain contract evolution—expectations change as products evolve; version alongside schemas.

← Back to Data Quality Monitoring Overview