Data Warehousing Fundamentals • Partitioning & Clustering StrategiesMedium⏱️ ~3 min
Production Scale: Partitioning at 400 Billion Events Per Day
The Scale Challenge:
At companies handling massive event streams, raw intuition about partitioning breaks down. What works for 10 GB tables creates operational nightmares at 50 TB per day. The key is understanding the mathematics of partition count, file size, and metadata overhead.
The Netflix or Meta Scale Problem:
Imagine ingesting 5 million events per second globally. That is 432 billion events per day, landing at roughly 50 TB per day compressed in columnar format. Without partitioning, queries scanning even one day of data would take minutes and cost significant compute dollars. But naive partitioning creates worse problems.
Suppose you partition by hour. That is 24 partitions per day, 720 per month. If event volume is uneven and some hours have only 100 GB while peak hours have 3 TB, you have created hot partitions. Queries during peak hours hit 3 TB partitions and run slowly, while off peak queries are fast. The inconsistency makes capacity planning difficult.
Now suppose you partition too finely, by minute. That is 1,440 partitions per day, 43,200 per month. With typical Parquet or ORC file sizes of 100 to 500 MB, each minute partition might have 2 to 10 files. At 43,200 partitions with 5 files each, you have 216,000 files per month. Systems like Apache Hive, Spark, or Presto must list and open each file. Metadata operations dominate query time. A query that should finish in 10 seconds spends 2 to 3 minutes just reading file listings from object storage.
The Sweet Spot:
Daily partitions are the most common choice for large event tables. Each partition holds 2 to 5 TB compressed, split across hundreds of files each around 256 MB to 1 GB. This keeps file count manageable while enabling parallelism. Queries filtering by date touch only relevant days, and within each day, clustering by
File Count Impact on Query Planning
1000 files
10 sec query
200K files
180 sec query
user_id or session_id enables data skipping.
For extremely high volume streams exceeding 100 TB per day, hourly partitions become necessary. But you must combine them with compaction jobs that merge small files into larger ones, typically running every few hours. This keeps file count under control.
Composite Partitioning for Multi Tenant Systems:
Some systems use composite keys like date + region or date + hash(user_id mod 10). This spreads data more evenly when certain dates or regions are hot. The trade off is that queries filtering by date alone must scan all sub partitions within that date. If you have 10 hash buckets per day, a date filter reads 10 partitions instead of 1. This can double or triple scan volume for time only queries.
✓ In Practice: BigQuery and Snowflake handle much of this automatically through micro partitions and automatic clustering. But in data lakes on S3 or GCS with Spark or Presto, you must explicitly design partition schemes and file compaction pipelines.
Operational Complexity:
At production scale, partitioning requires continuous monitoring. Track partition sizes to detect skew. Monitor file count per partition to catch over fragmentation. Set up alerts when clustering depth degrades. Schedule compaction and reclustering during low traffic windows. All of this is invisible at small scale but becomes critical infrastructure work at terabyte and petabyte scale.💡 Key Takeaways
✓Daily partitions are optimal for most large event tables, keeping partition sizes at 2 to 5 TB and file counts manageable under 1000 files per partition
✓Over partitioning (by minute or hour without compaction) creates 200K+ files, spending more time on metadata operations than actual data scanning
✓Under partitioning (monthly or no partitioning) forces full scans of 50+ TB even for single day queries, causing minute long latencies
✓Composite partitioning like date + hash(user_id) spreads load but increases scan volume for queries that filter only by date, trading generality for specific optimizations
✓Production systems require continuous monitoring of partition skew, file count, and clustering depth, with scheduled compaction and reclustering jobs
📌 Examples
1Event stream at 5 million events/second (432 billion/day): daily partitions yield 30 partitions per month with 2 TB each, enabling sub 20 second queries
2Minute level partitioning at same scale: 43,200 partitions per month with 216,000 files causes query planning to take 2 to 3 minutes before scans even start
3Composite partitioning with date + region: North America region partition gets 30 TB/day while smaller regions get 2 TB, requiring load balancing or further sub partitioning