Data Storage Formats & Optimization • Row-based vs Columnar FormatsMedium⏱️ ~3 min
How Row and Columnar Formats Actually Work
Row Based Physical Layout:
A table is broken into pages or blocks, typically 4 KB to 16 KB in size. Each page holds many complete rows for a given range of primary keys. The system maintains indexes that point into these pages. When you query a user by
At query time, the execution engine reads only chunks for referenced columns. If your query filters on
Write Handling Differences:
Columnar systems typically batch writes. Small writes accumulate in memory or a write optimized structure, then flush as new row groups. Background compaction periodically merges small row groups into larger ones, removes obsolete versions, and rewrites encodings. A system processing 100,000 events per second might buffer for 10 seconds (1 million events) before writing a row group, then compact every hour.
This is why columnar systems favor append heavy workloads with periodic batch updates rather than constant random updates hitting individual records.
user_id, the index lookup finds the relevant page, the engine reads it once from disk or cache, and the CPU scans within the page to locate the row.
Updates in place are straightforward. If the new row still fits on the page, the engine overwrites the old version or appends a new version and updates internal pointers. This makes row stores ideal for workloads with frequent small updates touching multiple fields in the same record.
Columnar Physical Layout:
Data is organized into row groups or stripes, where each group might cover 1 million rows. Within each group, the system stores separate column chunks. For a table with 50 columns and 1 billion rows divided into 1,000 row groups, you'd have 50,000 total column chunks. Each chunk stores an encoded array of values.
1
Dictionary encoding: Low cardinality strings like country codes get mapped to integers. "USA", "CAN", "MEX" become 0, 1, 2.
2
Run length encoding: Repeated values get compressed. 1,000 consecutive TRUE values become "TRUE x 1000".
3
Statistics storage: Each chunk stores min, max, and value counts enabling predicate pushdown and data skipping.
event_date between two dates, the engine checks chunk statistics and skips entire row groups where the date range doesn't overlap. This is data skipping.
Compression Impact
400 GB
ROW FORMAT
40 GB
COLUMNAR 10x
💡 Key Takeaways
✓Row stores use 4 KB to 16 KB pages with complete records, enabling single disk seek per record lookup
✓Columnar row groups typically hold 1 million rows with separate encoded chunks per column
✓Dictionary and run length encoding in columnar formats achieve 5x to 10x compression by grouping similar data types
✓Statistics per column chunk enable data skipping where entire row groups are skipped based on predicate filters
✓Columnar writes batch in memory then flush as row groups, with hourly compaction to merge small segments
📌 Examples
1PostgreSQL row store: 8 KB page holds 50 complete user records, B tree index points to page, single read fetches entire user
2Parquet file with 1 billion events: 1,000 row groups of 1 million rows each, country column uses dictionary encoding reducing 20 GB to 2 GB
3BigQuery scanning 30 days of events: checks min/max date statistics, skips 29 of 30 daily partitions, reads only 3 columns from remaining partition