Database Design • Column-Oriented Databases (Redshift, BigQuery)Medium⏱️ ~3 min
Partition and Cluster Key Design: The Primary Lever for Query Performance and Cost
Partitioning and clustering physically organize columnar data to maximize pruning, the technique that eliminates reading irrelevant data before scanning begins. A well designed partition key like ingestion date allows queries filtering on the last 7 days to skip 95% of a 2 year dataset instantly. Clustering within partitions by commonly filtered columns (user_id, device_id, status) enables min/max zone maps that skip entire data blocks. Together, these strategies reduce bytes scanned by 10x to 100x, directly cutting both latency and cost.
Partition granularity involves tradeoffs. Daily partitions work well for time series data with queries typically filtering on recent weeks. Hourly partitions enable finer pruning for real time dashboards but create small file proliferation if ingestion volumes are low. A table ingesting 10 GB per day benefits from daily partitions (10 GB files compress well and scan efficiently). A table ingesting 10 GB per hour needs hourly partitions to avoid 240 GB daily files that hurt query startup and pruning granularity.
Clustering orders data within partitions by one or more columns. BigQuery automatically maintains clustering through background compaction. If 80% of queries filter on user_id and status, clustering on (user_id, status) allows zone maps to skip blocks where neither value matches. A query for user_id 12345 with status "active" might eliminate 99% of blocks through clustering metadata alone. The cost is write amplification during inserts and periodic re clustering as data evolves.
Mistakes explode costs in serverless models. A query forgetting the date filter on a 100 TB table scans the full dataset at $500 versus $25 for a 5 TB weekly window. Clustering on high cardinality random IDs provides no skipping benefit but adds re clustering overhead. Monitor bytes scanned per query and enforce guardrails like partition filters in WHERE clauses and scan limits per user to catch runaway queries before they drain budgets.
💡 Key Takeaways
•Partition on time for time series workloads. Daily partitions on ingestion date allow queries filtering last 7 days to skip 95% of a 2 year dataset, cutting scan from 100 TB to 5 TB instantly.
•Cluster on commonly filtered dimensions. Clustering on user_id and status enables zone map skipping that eliminates 90% to 99% of remaining blocks when both filters appear in WHERE clause.
•Balance partition granularity with file size. Hourly partitions with low volume create thousands of small files that hurt metadata overhead and scan throughput. Target 128 MB to 1 GB per file after compression.
•High cardinality clustering wastes resources. Clustering on random UUID provides no skipping benefit but incurs write amplification and re clustering cost. Choose columns with repeated values and common filters.
•Serverless cost scales with bytes scanned. Missing partition filter turns $25 weekly query into $500 full table scan. Enforce partition filter requirements and scan byte limits per user.
•Re clustering maintenance is ongoing. As data evolves and new values arrive, clustering degrades. Schedule periodic re clustering during off peak hours to maintain pruning effectiveness.
📌 Examples
BigQuery events table with 100 TB over 2 years, partitioned daily and clustered on (user_id, status). Query filtering last 30 days for user_id 12345 with status active. Partition pruning eliminates 98.5 TB, cluster pruning eliminates another 1.4 TB. Final scan reads 100 GB, completes in 5 seconds at $0.50 cost instead of $500 and 2 minutes for full table.
Redshift fact table distributed on user_id and sorted by (timestamp, device_id). Dashboard query filtering last 24 hours for specific device. Sort key allows early termination after reading relevant time range, and distribution avoids shuffle. Query returns in 3 seconds versus 45 seconds without sort key.