Learn→Data Quality & Validation→Data Quality Dimensions (Accuracy, Completeness, Consistency)→4 of 5

Data Quality & Validation • Data Quality Dimensions (Accuracy, Completeness, Consistency)Hard⏱️ ~3 min

Trade-offs: When to Enforce Strict vs Eventual Quality

The Core Trade-off:

The fundamental question is: how strict and how early do you enforce each dimension, and what are you willing to pay in latency, availability, and complexity? There is no universal answer. The decision depends on use case criticality, read versus write ratio, and tolerance for temporary inconsistency.

Strict Validation at Ingestion
Zero bad data enters, but p99 latency increases from 50ms to 150ms. Risk of blocking pipeline if validators overloaded.
vs
Light Validation + Downstream Audits
Fast ingestion (50ms p99), but some bad data exists temporarily. Quarantined later by audits.
Accuracy: Front Door vs Downstream Filtering

Strict accuracy checks at ingestion prevent bad data from polluting downstream systems. The cost is higher latency and potential availability impact. If your validator rejects events or blocks waiting for external lookups like currency validation, you risk drops when validators are overloaded. Some companies choose lighter validation at the edge: check basic types and ranges, but defer semantic validation to downstream processing where you can quarantine bad data without blocking ingestion.

Decision criteria: Use strict ingestion validation when bad data has immediate user facing impact (payment processing, inventory updates) or when downstream correction is very expensive (retraining machine learning models). Use eventual validation when you can tolerate temporary bad data and when ingestion throughput is critical (logging, analytics events at over 100,000 events per second).

Completeness: Speed vs Integrity

You can run a report on partial data quickly, or wait until you are confident all data arrived. The trade off is explicit. A dashboard that refreshes every 2 minutes shows 95 to 98 percent of data. Waiting 30 minutes gets you to 99.95 percent. For operational dashboards where directional trends matter more than exact counts, fast and incomplete wins. For financial reporting or compliance where every transaction must be accounted for, slow and complete is mandatory.

Decision criteria: Real time operational metrics tolerate 2 to 5 percent incompleteness for speed. Financial aggregates, Service Level Agreement (SLA) reporting, and compliance audits require 99.9+ percent completeness even if it means 30 to 60 minute lag.

Consistency: Synchronous Coordination vs Eventual Reconciliation

Strong cross system consistency requires coordination, possibly distributed transactions. This limits throughput and increases latency. At write rates above 50,000 per second, synchronous consistency becomes a bottleneck. Most high scale systems accept eventual consistency and rely on periodic reconciliation jobs to catch drift. The trade off is that for some window (often minutes to hours), downstream analytics might observe contradictory states.

Property
Sync Consistency
Eventual Consistency
Write Throughput
10k to 20k/sec
100k+ per second
Lag Window
None
Minutes to hours
Use Case
Banking, inventory
Social feeds, analytics

Decision criteria: Use synchronous consistency when correctness is non negotiable and write throughput is moderate (banking transactions, inventory reservations). Use eventual consistency with reconciliation when you can tolerate temporary drift and need high write throughput (user activity logging, metrics collection, social media feeds).

"The decision is not whether to enforce quality, but where in the pipeline and at what cost. Every choice is a trade off between catching errors early versus maintaining throughput and availability."

💡 Key Takeaways

✓Strict ingestion validation increases p99 latency by 3x (50ms to 150ms) but prevents bad data from entering the system entirely

✓Fast dashboards trade completeness for speed: 2 minute refresh shows 95 to 98 percent of data versus 30 minute wait for 99.95 percent

✓Synchronous consistency limits write throughput to 10,000 to 20,000 per second versus 100,000+ per second with eventual consistency

✓Use strict validation when bad data has immediate user impact or expensive downstream correction costs like model retraining

✓Eventual consistency with reconciliation works when temporary contradictions are tolerable and write throughput requirements exceed 50,000 per second

📌 Interview Tips

1Payment processing uses strict ingestion validation despite 150ms p99 latency because incorrect amounts have immediate financial and legal consequences.

2Social media analytics accepts 2 to 5 percent data incompleteness in real time dashboards, running reconciliation jobs hourly to catch up to 99.9 percent for billing.

3High volume logging systems at over 100,000 events per second use light ingestion checks and quarantine invalid data in downstream batch processing to avoid blocking ingestion.

← Back to Data Quality Dimensions (Accuracy, Completeness, Consistency) Overview