Copy on Write vs Merge on Read Storage

The Storage Layout Choice: Hudi offers two fundamentally different ways to handle updates, each trading read performance for write efficiency. Understanding this trade off is critical because it affects latency, cost, and operational complexity.

Copy on Write (COW): When an update arrives, Hudi reads the entire Parquet file containing that record, merges in the new version, and writes a completely new file. The old file is marked for deletion.

The advantage is simple, fast reads. Query engines just scan Parquet files with standard columnar optimizations. There are no log files to merge, no complex readers needed. Read latency is predictable and optimized.

The cost is write amplification. Updating one record in a 128 MB Parquet file means rewriting all 128 MB. For a table receiving 100k updates per second spread across 1000 files, you might rewrite 12.8 GB every minute, generating significant compute load and temporary storage.

Write Amplification Impact
1 KB
UPDATE SIZE
128 MB
FILE REWRITTEN
Merge on Read (MOR): Updates get appended to small log files stored alongside base Parquet files. A base file might be partition_x_file_1.parquet with logs like partition_x_file_1.log.1, partition_x_file_1.log.2.

Writes become much cheaper because you only append deltas, not rewrite entire files. That same 100k updates per second might generate only 100 MB of log files per minute instead of 12.8 GB of rewrites.

The trade off hits at read time. Queries must merge base Parquet with all delta logs to get the current view. If compaction falls behind and 50 log files accumulate per base file, query latency can balloon from seconds to minutes as readers stitch everything together.

Copy on Write
Fast reads, write heavy. Best for 90% reads, 10% writes
vs
Merge on Read
Fast writes, read overhead. Needs aggressive compaction tuning
Background Compaction: MOR requires a separate process that periodically reads base files plus logs, merges them, and writes new compacted base files. This must be scheduled carefully. Too infrequent and reads suffer. Too aggressive and compaction starves ingestion resources.

💡 Key Takeaways

✓Copy on Write rewrites entire Parquet files on updates, giving simple fast reads but high write amplification. One 1 KB update rewrites a 128 MB file

✓Merge on Read appends updates to log files, reducing writes dramatically but requiring readers to merge base plus logs at query time

✓MOR requires aggressive compaction scheduling. If 50 log files accumulate, query latency can increase from seconds to minutes

✓Choose COW for read heavy workloads with 90% reads and moderate update rates. Choose MOR for write intensive streaming ingestion at 100k+ updates per second

✓Recent Hudi improvements achieved 17x better index lookup latency, critical for high throughput upsert workloads

📌 Interview Tips

1Systems with 90% analytical reads and 10% updates: COW provides predictable query performance without complex compaction tuning

2Streaming CDC ingestion at 100k updates/sec: MOR appends 100 MB/min of logs vs COW rewriting 12.8 GB/min of Parquet files

3If compaction lags on MOR tables, a query that normally scans base Parquet in 5 seconds might take 3 minutes merging 50 delta logs

← Back to Apache Hudi for Incremental Processing Overview