Distributed Execution Models: Massively Parallel Processing (MPP) Clusters vs Serverless Pooled Compute
MPP Cluster Architecture
MPP (Massively Parallel Processing) clusters distribute query execution across dedicated compute nodes coordinated by a leader. You provision a fixed cluster (say 10 nodes), control data distribution (how rows spread across nodes via a distribution key), and define sort keys (physical ordering within each node). This provides deterministic performance: a well-tuned cluster delivers consistent 5-second response times for dashboard queries because you control the hardware.
The cost: you pay for uptime regardless of utilization, manually scale when workloads grow, and waste spend during idle periods. A 10-node cluster at ,000/month runs continuously even if 60% of capacity sits idle overnight. Economics favor steady, predictable workloads with high utilization.
Serverless Pooled Compute
Serverless models decouple storage from compute. Data lives in distributed object storage, and queries dynamically schedule parallel readers across a shared pool of slots (execution units). You pay per bytes scanned (~/TB) plus storage (~/TB/month). Query scanning 10 TB costs whether it runs at 2am or peak hours.
This model excels at spiky, unpredictable workloads and eliminates scaling decisions. However, performance varies under multi-tenancy (shared resources with other users), and poor partition pruning explodes costs: a mistaken full table scan of 100 TB costs in one query.
Join Execution Differences
MPP clusters use distribution keys to colocate join keys on the same nodes, avoiding expensive shuffles (moving data between nodes). If fact table and dimension table share distribution key user_id, joins are local. Serverless systems broadcast small tables (< 100-300MB) to all workers or shuffle both sides on join key, causing network I/O.