Data Lakes & Lakehouses • Delta Lake Internals & ACID TransactionsMedium⏱️ ~2 min
The Transaction Log: How Delta Lake Tracks Changes
The Mechanism: Delta Lake's transaction log lives in a dedicated
A Real Streaming Example: Consider a streaming job ingesting clickstream events at 200,000 events per second. Every 5 seconds, it writes a microbatch as a new Delta commit. Each commit adds roughly 50 to 100 Parquet files totaling 2 to 3 GB.
The job maintains a
_delta_log subdirectory inside the table folder. Each commit appends a new JSON file with a sequential version number: 00000000000000000010.json for version 10, 00000000000000000011.json for version 11, and so on.
Each JSON file contains a sequence of action records that describe what changed. The most important actions are Add (declares a new Parquet file as part of the table) and Remove (tombstones a file, marking it as deleted). Other actions include Metadata (schema and partitioning), Protocol (feature versions), and SetTransaction (for idempotent streaming writes).
Building a Snapshot: To read the table at version 105, you reconstruct the snapshot by replaying the log. Start from the beginning, applying each Add and Remove action in order, building up a set of active files. By the time you reach version 105, you know exactly which Parquet files are part of that snapshot.
This is where checkpoints become critical. Replaying thousands of JSON files would be slow. Delta Lake periodically writes a checkpoint file (a Parquet file containing all active Add actions at that version). If checkpoints happen every 10 commits, a reader can load the checkpoint at version 100, then apply only JSON logs 101 through 105. This keeps metadata read times under 500 milliseconds even for tables with millions of files.
Snapshot Reconstruction Performance
WITHOUT CHECKPOINT
10+ sec
→
WITH CHECKPOINT
< 500ms
SetTransaction action in the log recording which Kafka offsets it has processed. If the job crashes and restarts, it reads the log to find the last committed offset, then resumes from there. This provides exactly once semantics: no duplicate events, no missed events, even across restarts.
⚠️ Common Pitfall: Long running jobs can hold onto old snapshots. If a job starts at version 100 and runs for 2 hours while the table advances to version 2000, its commit may fail because the files it read have since been deleted. Structure pipelines into smaller incremental batches to avoid this.
💡 Key Takeaways
✓The transaction log is a sequence of JSON files numbered by version (00000000000000000010.json for v10), each containing Add, Remove, Metadata, and other action records
✓Checkpoints are Parquet files written every N commits (typically 10) that contain the full active file list at that version, enabling fast snapshot reconstruction
✓To read version 105, load the checkpoint at version 100, then apply JSON logs 101 through 105, keeping metadata reads under 500ms even for petabyte tables
✓Streaming jobs use SetTransaction actions to record processed offsets, enabling exactly once semantics across restarts
📌 Examples
1A table at version 1000 with checkpoints every 10 commits: readers load checkpoint at v1000 (contains all active files), no additional JSON needed. Total metadata read: 200ms.
2Streaming ingestion at 200k events/sec writes microbatches every 5 seconds. Each commit adds 50-100 Parquet files. SetTransaction records Kafka offset 9876543, so on restart the job resumes from that offset.