Distributed Data Processing • Spark Memory Management & TuningHard⏱️ ~3 min
Memory Configuration Trade Offs
Executor Sizing Decisions:
The first trade off is executor granularity. You could run one giant 64 GB executor per node, or four smaller 16 GB executors. Larger heaps allow more data in memory and reduce shuffle spilling. A single task handling 8 GB of data fits comfortably in a 64 GB executor but might struggle in a 16 GB executor if multiple tasks run concurrently.
But larger heaps worsen garbage collection behavior. Full garbage collection pauses scale with heap size and object count. A 16 GB executor might pause for 500 milliseconds during major garbage collection. A 64 GB executor with the same object density could pause for 3 to 5 seconds. Beyond roughly 32 GB per JVM, pointer compression becomes less effective and garbage collection algorithms struggle. Many companies cap executor heap at 24 to 32 GB, running multiple executors per node to balance memory availability against garbage collection stability.
Unified Memory Fraction:
Allocating more heap to unified memory (say, increasing from 60% to 75%) gives more room for execution and storage. This helps heavy shuffle stages and large caches. But it shrinks user memory, so complex User Defined Functions (UDFs) maintaining in memory hash maps or buffers are more likely to out of memory. Conversely, shrinking unified memory to 50% protects custom code but increases shuffle spilling and cache eviction, hurting performance and increasing load on the storage layer.
Execution vs Storage Balance:
Within unified memory, you can bias toward execution or storage. For jobs relying heavily on caching, increasing storage's guaranteed fraction (from 50% to 60%) prevents important cached datasets from being evicted during shuffles. For shuffle heavy jobs with minimal caching, allocating more to execution (60 to 70%) reduces spill frequency. The decision depends on workload: read intensive analytics with reused intermediate datasets favor storage, while write intensive ETL with large aggregations favors execution.
Serialization Format Trade Off:
Using serialized caching (MEMORY_ONLY_SER) reduces memory footprint by 2x to 5x compared to object format (MEMORY_ONLY). This means a 500 GB dataset might fit in a 2 TB cluster where it wouldn't before. But access requires deserialization, adding CPU overhead. For datasets accessed once or twice, the CPU cost might outweigh the memory savings. For datasets accessed 10 times across many jobs, the reduced memory pressure and garbage collection overhead often improve overall runtime despite deserialization cost.
Cost vs Performance:
Aggressive tuning can extract 20 to 50% performance gains. You might push executor heap to 48 GB, set unified memory to 75%, bias execution to 65%, use aggressive broadcast thresholds, and cache liberally. This works beautifully until data shape changes: a new product line adds 30% more records, a previously uniform key becomes skewed, or a dimension table doubles in size. Suddenly, jobs that finished in 60 minutes take 3 hours or fail repeatedly.
Many large companies accept 10 to 20% slower jobs in exchange for wider safety margins: more conservative memory settings, extra spill headroom, lower broadcast thresholds, and more partitions to reduce per task memory. This operational resilience is often worth the incremental cost, especially for critical pipelines with strict SLAs.
Large Executors (64 GB)
More cache space, less spilling, but 3 to 5 sec GC pauses
vs
Small Executors (16 GB)
More spilling, less cache, but 500 ms GC pauses
"The decision isn't 'use more memory everywhere.' It's: what's my read to write ratio? What's my data reuse pattern? How skewed is my data? Will this configuration remain safe when input doubles?"
💡 Key Takeaways
✓Large executors (64 GB) reduce spilling but suffer 3 to 5 second garbage collection pauses; small executors (16 GB) have 500 ms pauses but spill more frequently
✓Increasing unified memory from 60% to 75% helps shuffles and caching but shrinks user memory, risking out of memory in complex User Defined Functions
✓Serialized caching cuts memory footprint by 2x to 5x, critical for fitting large datasets, but adds CPU deserialization cost on every access
✓Biasing execution memory to 65% reduces spill in shuffle heavy ETL; biasing storage to 60% protects cached data in read intensive analytics
✓Aggressive tuning can gain 20 to 50% performance but creates fragility: the same config that works at 5 TB input may fail catastrophically at 10 TB due to skew
📌 Examples
1A 64 GB executor with high object churn pauses for 3 to 5 seconds during full garbage collection, freezing all tasks. Four 16 GB executors on the same node pause for 500 ms each but isolate failures.
2Setting unified memory to 75% allows a 6 TB shuffle to complete without spilling, but causes User Defined Functions with 2 GB in memory state to out of memory when user memory drops to 5 GB per executor.
3A dataset accessed 10 times across jobs benefits from serialized caching despite deserialization CPU cost, because the 3x memory reduction prevents cache eviction and recomputation that would cost far more.