Database DesignColumn-Oriented Databases (Redshift, BigQuery)Hard⏱️ ~3 min

Write Patterns and Compaction: Why Columnar Stores Excel at Append but Struggle with Updates

Why Columnar Segments Are Immutable

Columnar segments are immutable by design. Once written, each column is tightly compressed (dictionary encoded, run-length packed) and indexed with min/max metadata for skipping. Any modification requires decompressing, editing, recompressing, and rewriting entire segments. This causes write amplification of 10-100x: updating a 1KB row might trigger rewriting a 100MB segment.

Append-Only With Compaction

The solution is append-only architecture with periodic compaction. New data lands in small mutable delta files or separate append partitions. Reads merge base segments with deltas at query time, adding some latency. Background compaction periodically rewrites base plus deltas into new optimized segments, then atomically swaps metadata. Deletions use tombstone markers (flags indicating "deleted") that filter results during reads until compaction physically removes rows.

Bulk vs Micro-Batch Ingestion

This pattern works beautifully for bulk ingestion. Batch loads of millions of rows every hour write large, well-compressed segments (1GB each) maximizing scan throughput. Micro-batches every 5 minutes streaming CDC (Change Data Capture, the process of tracking row-level changes from transactional databases) work too if buffered to produce 128MB segments.

The breaking point is high-frequency updates: incrementing counters or editing individual records hundreds of times per day. Each edit creates a new delta fragment, reads slow as they merge dozens of deltas, and compaction falls behind.

Real-Time Columnar Systems

Real-time columnar stores handle millions of events per second with sub-second query latency by accepting eventual consistency during a short window. They buffer micro-batches in memory before flushing to immutable segments, and run aggressive compaction on fresh data to keep read amplification low. Traditional column warehouses target minutes to hours of ingestion lag, making them unsuitable for operational dashboards requiring seconds-fresh data.

💡 Key Takeaways
Immutable segments enable aggressive compression and indexing; any edit requires rewriting entire segment causing 10-100x write amplification
Append-only architecture routes writes to delta files; reads merge base segments with deltas; compaction periodically consolidates
Deletions use tombstone markers filtered at read time; physical removal happens during compaction when base and deltas merge
Bulk ingestion (millions of rows hourly) produces optimal 1GB segments; micro-batch every 5 minutes works if buffered to 128MB
High-frequency updates (same row edited hundreds of times) create massive delta accumulation, degrading query performance
Real-time systems buffer in memory and run aggressive compaction, accepting seconds of lag to maintain columnar benefits
📌 Interview Tips
1Calculate write amplification: 1KB row update in 100MB segment requires decompressing, editing, recompressing entire segment = 100x amplification. Batch 1000 updates together to amortize.
2Design for append-only: order_status updated 5-10 times per order. Instead of UPDATE, create order_events table with append-only rows. Query latest event per order.
3Compare freshness: traditional warehouse batches hourly (60-minute lag). Real-time OLAP buffers 5-10 seconds in memory before flush, enables operational dashboards.
← Back to Column-Oriented Databases (Redshift, BigQuery) Overview
Write Patterns and Compaction: Why Columnar Stores Excel at Append but Struggle with Updates | Column-Oriented Databases (Redshift, BigQuery) - System Overflow