Memory Overcommit and Copy-On-Write (COW) Trade-offs
Memory Overcommit
Overcommit allows allocating more virtual memory than physical RAM exists. A malloc for 10 GB succeeds even with only 8 GB physical RAM. The kernel bets that not all allocated memory will be used simultaneously. This works because most programs allocate more than they touch.
The risk is obvious. If processes actually use all allocated memory, physical RAM exhausts. The OOM killer (Out Of Memory killer) terminates processes to reclaim memory. Which process dies depends on heuristics that may not match your priorities. A database might be killed while a logging daemon survives.
Copy On Write Mechanics
When fork() creates a child process, memory is not copied immediately. Both parent and child share the same physical pages, marked read only. When either writes, the CPU triggers a fault. The kernel copies only that page, then allows the write. This is copy on write (COW).
COW makes fork nearly instant regardless of process size. A 10 GB process forks in microseconds, not the seconds that copying would take. Most forked processes exec immediately, replacing their memory anyway. COW avoids copying memory that would be discarded.
COW Latency Surprises
The write after fork triggers a page fault. If the parent has 10 GB mapped and starts modifying it all, each 4 KB page triggers a fault and copy. That is 2.5 million faults. At 10 microseconds each, that is 25 seconds of fault handling.
Databases with background saves illustrate this. They fork to create a snapshot. The parent continues serving writes. Each write triggers COW. If the working set is large and write heavy, the fork save can cause massive latency spikes. Solutions include avoiding fork for snapshots or reducing write rate during saves.