What is Data Reconciliation?

Definition
Data reconciliation is the process of systematically checking that multiple copies or representations of the same business data remain consistent, accurate, and complete across different systems.
The Core Problem: In distributed systems, the same business entity lives in multiple places. A payment record exists in your payment processor, your ledger service, your data warehouse, and your machine learning feature store. Once data leaves the primary system of record, you can no longer rely on ACID (Atomicity, Consistency, Isolation, Durability) transactions to keep these views synchronized.

This creates a trust problem. Without reconciliation, you have no systematic way to verify that all these copies are actually telling the same story.

The Three Pillars: Every reconciliation system addresses three core challenges.

First is matching, which is deciding that a record in system A corresponds to a record in system B. You might use a primary key like order_id, a composite business key like (user_id, transaction_date), or more advanced fuzzy matching.

Second is comparison, which is deciding what to check once records are matched. Lightweight approaches compare just counts and aggregate checksums. Thorough approaches compare every field value, business rules, and derived metrics.

Third is resolution, which is what you do when differences are found. This ranges from automatic correction (like rerunning an ETL job) to raising critical incidents when money or compliance is involved.

✓ In Practice: Think of reconciliation as a continuous monitoring layer around your data movement processes, not a one time data migration checklist. Systems like Uber reconcile millions of trip records hourly to ensure their transactional database matches their analytics warehouse.

💡 Key Takeaways

✓Data reconciliation solves the trust problem that emerges when business data is copied across multiple systems and you can no longer rely on transactions to maintain consistency

✓The three pillars are matching (how you identify corresponding records), comparison (what fields or metrics you check), and resolution (how you handle discovered differences)

✓Reconciliation operates at different levels: lightweight checks like row counts detect gross issues cheaply, while cell level comparison of every field is expensive but necessary for high risk domains like billing

✓Modern reconciliation is a continuous monitoring layer, not a one time activity, running hourly or daily to detect drift as data flows through pipelines

📌 Interview Tips

1At Uber, trip data flows from mobile apps to transactional stores, through Kafka into stream processors, and finally into data lakes. Reconciliation ensures the sum of completed trips in the OLTP system matches the warehouse total within a few basis points every hour

2A payment processor might reconcile 500 million records by joining on order_id and comparing amount, currency, and status fields, flagging any mismatches for investigation

← Back to Data Reconciliation Techniques Overview