Big Data Systems • Columnar Storage & CompressionHard⏱️ ~3 min
When Do Columnar Systems Fail? Critical Failure Modes and Edge Cases
Columnar storage excels at analytical scans but degrades severely under certain conditions. High cardinality uniformly distributed columns compress poorly because dictionary encoding adds overhead without savings and Run Length Encoding (RLE) finds no runs. A universally unique identifier (UUID) column or high entropy hash may actually expand data after adding dictionary structures, dropping compression from 10 times to 0.8 times. Expect Central Processing Unit (CPU) overhead from codec operations without corresponding Input/Output (I/O) reduction.
Data layout mismatches destroy pruning effectiveness. A table sorted by timestamp but queried primarily by userId will scan nearly all pages because zone maps on userId span the entire value range within each segment. Meta reports that queries on poorly partitioned tables can run 10 to 100 times slower than the same query on a table with sort keys aligned to predicates. The cost of maintaining multiple sort orders or materialized views becomes a critical production consideration.
Frequent updates and deletes are particularly painful. Most columnar systems are append only; updates require reading segments, modifying rows, and rewriting entire column files (write amplification). Tombstones for deletes accumulate until compaction, inflating storage and scan costs. If ingestion spikes trigger compaction storms, background merges saturate disk and Central Processing Unit (CPU), causing query tail latencies to spike from milliseconds to seconds and potentially throttling reads. Amazon Redshift users managing high update workloads often resort to batch windows or separate staging tables to control compaction overhead.
The small file problem plagues streaming ingestion. Writing many tiny segments (under 10 megabytes) defeats sequential I/O and inflates metadata overhead. Each file open incurs Remote Procedure Call (RPC) latency; a query touching 10,000 small files can spend seconds just opening them. Systems mitigate this by micro batching (accumulating 1 to 15 minutes of data into 64 to 512 megabyte segments) but this adds latency to data visibility. Wide SELECT queries requesting many columns negate column pruning entirely; memory bandwidth and row stitching overhead can make columnar slower than row stores for those access patterns.
💡 Key Takeaways
•High cardinality uniformly distributed columns (UUIDs, hashes) compress poorly or expand, dropping ratios from 10 times to 0.8 times with dictionary overhead exceeding savings
•Sort key mismatches cause near zero pruning: table sorted by timestamp but queried by userId scans all segments, running 10 to 100 times slower than correctly sorted layout
•Frequent updates require rewriting entire column segments (write amplification), and compaction storms from ingest spikes cause read tail latencies to spike from milliseconds to seconds
•Small file problem from streaming: 10,000 tiny segments spend seconds in file open Remote Procedure Calls (RPCs) versus milliseconds for batched large segments (64 to 512 MB)
•Wide SELECT queries requesting many columns negate column pruning; memory bandwidth and row stitching overhead can make columnar slower than row stores for full row access
📌 Examples
Meta Presto: queries on tables without aligned sort keys or with skewed partitioning see 10 to 100 times latency degradation as pruning fails and most stripes must be scanned despite having statistics
Amazon Redshift update workloads: high frequency updates cause write amplification and compaction lag, forcing users to batch updates in staging tables and merge during maintenance windows to control overhead
ClickHouse small file scenario: streaming ingestion creating thousands of 1 to 10 megabyte parts causes query latency to climb due to metadata overhead; mitigation requires micro batching into 64 megabyte to 512 megabyte parts
Bloom filter false positives: oversubscribed bloom filters with 5% false positive rate on high query per second (QPS) workloads waste 5% of I/O scanning empty segments, degrading throughput and latency at scale