Partition and Cluster Key Design: The Primary Lever for Query Performance and Cost
Pruning Through Partitioning
Partitioning physically organizes data to maximize pruning (eliminating data before scanning). A partition key like ingestion_date groups data into separate storage units. A query filtering the last 7 days can skip 95% of a 2-year dataset instantly, reducing scan from 100TB to 5TB.
Partition granularity involves trade-offs. Daily partitions work for time-series queries on recent weeks. Hourly partitions enable finer pruning for real-time dashboards but create small file proliferation if volume is low. Target 128MB-1GB per partition file after compression.
Clustering Within Partitions
Clustering orders data within partitions by one or more columns. If 80% of queries filter on user_id and status, clustering on (user_id, status) allows zone maps to skip blocks where neither value matches. A query for user_id=12345 with status="active" might eliminate 99% of blocks through metadata alone.
The cost is write amplification during inserts and periodic re-clustering as data evolves. Background compaction maintains cluster ordering, consuming CPU and I/O continuously.
Cost Explosion Prevention
In serverless models, bytes scanned directly determine cost. Missing a date filter on a 100TB table scans everything at versus for a 5TB weekly window. Clustering on high-cardinality random IDs (UUIDs) provides no skipping benefit but adds re-clustering overhead.
Monitor bytes scanned per query, enforce partition filters in WHERE clauses for large tables, and set scan limits per user (e.g., 10TB/day) to catch runaway queries before they drain budgets.