OS & Systems FundamentalsGarbage Collection FundamentalsMedium⏱️ ~2 min

GC Throughput vs Latency Trade offs and Memory Overhead

Garbage collectors exist on a three dimensional spectrum: throughput, latency, and memory overhead. Throughput oriented collectors maximize CPU time for application code (the mutator) by batching GC work into infrequent, longer stop the world (STW) pauses. They achieve 95+ percent mutator utilization but may pause for hundreds of milliseconds to seconds during major collections. Low latency collectors minimize pause times by performing most work concurrently with the mutator, introducing barriers and metadata overhead that reduces mutator utilization to 80 to 90 percent but keeps pauses under 10 milliseconds even on 100+ GB heaps. Memory overhead is the hidden cost of low latency. Concurrent collectors need headroom (free space) to evacuate and compact regions without stalling the mutator. Running above 80 to 85 percent heap occupancy with high allocation rates risks promotion failures (unable to copy survivors because old space is full or fragmented), triggering emergency full heap STW compaction that can pause for multiple seconds. Production guidance is to provision 25 to 50 percent more heap than the live set: if your application has 20 GB of reachable objects under steady load, allocate 30 to 40 GB heap. Additional memory costs include remembered sets (tracking cross region pointers, often 1 to 5 percent of heap), survivor spaces (copy buffers for evacuation, another 5 to 20 percent), and card tables (byte arrays marking modified heap regions). Write barriers add CPU overhead: every pointer update executes a small snippet to maintain GC invariants, costing 2 to 8 percent of total CPU depending on mutation rate. The decision matrix is clear. Choose throughput collectors (like Parallel GC) for batch processing, offline analytics, or workloads where tail latency doesn't matter and you want maximum work per CPU cycle. Choose low latency collectors (like ZGC, Shenandoah, or G1 with aggressive tuning) for user facing APIs, real time data processing, or interactive UIs where p99 latency under 10 to 50 milliseconds is critical. For ultra low latency requirements (sub 100 microsecond p99), avoid tracing GC entirely and use ownership models (C++ RAII, Rust borrow checker) or arena/region allocation patterns.
💡 Key Takeaways
Throughput collectors batch GC work into infrequent long pauses (100 ms to seconds), achieving 95+ percent mutator CPU utilization, ideal for batch jobs and offline processing where latency doesn't matter
Low latency collectors run most GC concurrently with barriers and metadata, keeping pauses under 10 ms on 100+ GB heaps but reducing mutator utilization to 80 to 90 percent due to barrier overhead
Memory headroom is critical: concurrent collectors need 25 to 50 percent free space above live set to evacuate without promotion failures; running above 80 to 85 percent occupancy risks emergency full GC pauses of multiple seconds
Additional memory costs include remembered sets tracking cross region pointers (1 to 5 percent heap), survivor copy buffers (5 to 20 percent), and card tables (1 to 2 percent); write barriers add 2 to 8 percent CPU overhead per pointer update
For ultra low latency (sub 100 microsecond p99) like high frequency trading or embedded control loops, avoid tracing GC entirely and use ownership models (C++ RAII, Rust) or arena/region allocation with explicit lifetime management
📌 Examples
Apache Spark batch job processing 10 TB data with 256 GB heap uses Parallel GC: accepts 2 second full GC pauses every 10 minutes to maximize throughput, completing job 15 percent faster than with low latency collector
Amazon API service with 48 GB heap (30 GB live set) uses ZGC with 60 percent headroom: p99 GC pause 8 ms at 100K requests per second, but mutator utilization drops to 85 percent vs 92 percent with throughput collector
Google V8 JavaScript in Chrome targets 16 ms frame budget for 60 fps rendering: incremental marking and concurrent compaction keep main thread pauses under 5 ms, accepting 10 percent CPU overhead for write barriers to avoid UI jank
Cassandra cluster node with 16 GB heap running at 90 percent occupancy experiences promotion failure under write spike: fallback full compaction pauses 8 seconds, exceeds 5 second timeout, triggers node replacement and rebalance storm
← Back to Garbage Collection Fundamentals Overview
GC Throughput vs Latency Trade offs and Memory Overhead | Garbage Collection Fundamentals - System Overflow