Big Data Systems • Data Lakes & LakehousesMedium⏱️ ~3 min
What Are Lakehouses and How Do They Provide ACID Guarantees?
Lakehouses add a table abstraction layer with transactional guarantees on top of data lake object storage. They introduce open table formats (like Apache Iceberg, Delta Lake, Apache Hudi) that manage metadata, snapshots, and Atomicity Consistency Isolation Durability (ACID) semantics over files. Unlike pure lakes where files are just objects in directories, lakehouses use append only transaction logs and snapshot pointers to commit table changes atomically, enabling multiple readers and writers to operate concurrently with snapshot isolation.
Under the hood, each table version is a snapshot pointing to a manifest that lists all data files comprising that version. Writers produce new data files, compute a new manifest, and atomically publish a pointer to the new snapshot. Readers pick a snapshot at query start and see a consistent view even as writers commit new versions. This enables upserts and deletes through Change Data Capture (CDC) patterns, time travel to any historical snapshot, and scalable metadata operations without listing millions of files.
Netflix adopted Apache Iceberg to manage petabyte scale analytic tables with millions of data files per table. Before lakehouses, query planning required listing millions of partition directories, taking minutes. After migration, compact metadata and statistics reduced planning to seconds by reading a small manifest hierarchy instead of directory listings. Thousands of tables are now read concurrently by batch processing and interactive SQL engines with snapshot isolation preventing conflicts.
The tradeoff is operational complexity. You gain ACID, schema enforcement, upserts, and time travel, but must manage transaction logs, schedule compaction to merge small files, and enforce retention policies to prevent storage growth from snapshot accumulation. Performance approaches warehouses for large scans but point lookups and highly selective queries may still favor dedicated warehouse indexes.
💡 Key Takeaways
•Snapshot isolation via manifest and transaction log: writers atomically publish new snapshot pointers, readers see consistent table versions even during concurrent writes
•Netflix reduced query planning from minutes to seconds for petabyte tables with millions of files by using compact metadata instead of directory listings
•Optimistic concurrency control validates no conflicting changes to overlapping partitions before commit, triggering retries on conflicts under high write contention
•Copy on write rewrites affected files on upserts (simple fast reads, expensive writes) vs merge on read appends deltas (fast writes, reads merge base plus deltas)
•Operational complexity: manage transaction logs, schedule compaction for small files (target 256 MB to 1 GB), enforce snapshot retention (7 to 30 days typical) to prevent storage growth
📌 Examples
Netflix Iceberg tables: petabyte scale, millions of files per table, thousands of concurrent queries from batch and interactive engines, query planning reduced from minutes to seconds
Uber Apache Hudi: minute level freshness (5 to 15 minutes end to end), petabyte scale datasets, hundreds of TB per day ingestion, merge on read for upserts and deletes with background compaction