Write Patterns and Compaction: Why Columnar Stores Excel at Append but Struggle with Updates
Why Columnar Segments Are Immutable
Columnar segments are immutable by design. Once written, each column is tightly compressed (dictionary encoded, run-length packed) and indexed with min/max metadata for skipping. Any modification requires decompressing, editing, recompressing, and rewriting entire segments. This causes write amplification of 10-100x: updating a 1KB row might trigger rewriting a 100MB segment.
Append-Only With Compaction
The solution is append-only architecture with periodic compaction. New data lands in small mutable delta files or separate append partitions. Reads merge base segments with deltas at query time, adding some latency. Background compaction periodically rewrites base plus deltas into new optimized segments, then atomically swaps metadata. Deletions use tombstone markers (flags indicating "deleted") that filter results during reads until compaction physically removes rows.
Bulk vs Micro-Batch Ingestion
This pattern works beautifully for bulk ingestion. Batch loads of millions of rows every hour write large, well-compressed segments (1GB each) maximizing scan throughput. Micro-batches every 5 minutes streaming CDC (Change Data Capture, the process of tracking row-level changes from transactional databases) work too if buffered to produce 128MB segments.
The breaking point is high-frequency updates: incrementing counters or editing individual records hundreds of times per day. Each edit creates a new delta fragment, reads slow as they merge dozens of deltas, and compaction falls behind.
Real-Time Columnar Systems
Real-time columnar stores handle millions of events per second with sub-second query latency by accepting eventual consistency during a short window. They buffer micro-batches in memory before flushing to immutable segments, and run aggressive compaction on fresh data to keep read amplification low. Traditional column warehouses target minutes to hours of ingestion lag, making them unsuitable for operational dashboards requiring seconds-fresh data.