Database DesignColumn-Oriented Databases (Redshift, BigQuery)Medium⏱️ ~3 min

Partition and Cluster Key Design: The Primary Lever for Query Performance and Cost

Pruning Through Partitioning

Partitioning physically organizes data to maximize pruning (eliminating data before scanning). A partition key like ingestion_date groups data into separate storage units. A query filtering the last 7 days can skip 95% of a 2-year dataset instantly, reducing scan from 100TB to 5TB.

Partition granularity involves trade-offs. Daily partitions work for time-series queries on recent weeks. Hourly partitions enable finer pruning for real-time dashboards but create small file proliferation if volume is low. Target 128MB-1GB per partition file after compression.

Clustering Within Partitions

Clustering orders data within partitions by one or more columns. If 80% of queries filter on user_id and status, clustering on (user_id, status) allows zone maps to skip blocks where neither value matches. A query for user_id=12345 with status="active" might eliminate 99% of blocks through metadata alone.

The cost is write amplification during inserts and periodic re-clustering as data evolves. Background compaction maintains cluster ordering, consuming CPU and I/O continuously.

Cost Explosion Prevention

In serverless models, bytes scanned directly determine cost. Missing a date filter on a 100TB table scans everything at versus for a 5TB weekly window. Clustering on high-cardinality random IDs (UUIDs) provides no skipping benefit but adds re-clustering overhead.

Monitor bytes scanned per query, enforce partition filters in WHERE clauses for large tables, and set scan limits per user (e.g., 10TB/day) to catch runaway queries before they drain budgets.

Key Insight: Partition on time (when data arrived), cluster on dimensions (how data is queried). Together they reduce bytes scanned by 10-100x, directly cutting both latency and cost.
💡 Key Takeaways
Partition on time column allows queries filtering last 7 days to skip 95% of 2-year dataset, reducing 100TB scan to 5TB instantly
Balance partition granularity with file size: target 128MB-1GB per file after compression; hourly partitions need sufficient volume
Cluster on commonly filtered dimensions; clustering on (user_id, status) enables zone map skipping eliminating 90-99% of blocks
High-cardinality clustering (random UUIDs) provides no skipping benefit but incurs write amplification and re-clustering costs
Serverless cost scales with bytes scanned: missing partition filter turns query into full table scan
Enforce partition filter requirements on large tables and set per-user scan limits (10TB/day) to prevent cost explosions
📌 Interview Tips
1Calculate pruning impact: 100TB table, 2 years of data. Query filtering last 30 days: partition pruning eliminates 98.5TB. Cluster pruning on user_id eliminates 1.4TB more. Final scan: 100GB.
2Design partition granularity: 10GB/day table uses daily partitions (10GB files compress well). 10GB/hour table uses hourly partitions to avoid 240GB daily files.
3Monitor for cost explosions: dashboard query scanning 1TB costs . Same dashboard without date filter scans 100TB at . Alert on queries >10TB.
← Back to Column-Oriented Databases (Redshift, BigQuery) Overview