OS & Systems FundamentalsMemory Management & Virtual MemoryMedium⏱️ ~3 min

Demand Paging and Page Fault Latency Impacts

Demand paging loads pages into RAM only when accessed, not at allocation time. This lazy loading enables oversubscription and fast process startup, but introduces page faults. A minor fault occurs when the page is already in the page cache but not mapped into the process (common for file backed mappings and COW pages). A major fault requires disk I/O to fetch the page, costing orders of magnitude more time. The latency numbers are stark. A DRAM hit costs 60 to 100 nanoseconds. A TLB miss and page table walk adds another 100 to 200 nanoseconds. A minor fault (no I/O) takes 1 to 10 microseconds due to operating system overhead. A major fault on NVMe requires a 4 KB random read, costing 80 to 200 microseconds. On spinning disks, major faults take 5 to 10 milliseconds. This means major faults are 1,000x to 100,000x slower than DRAM hits. Even a low page fault rate destroys effective memory access time. The formula is Effective Access Time equals (1 minus p) times memory latency plus p times fault latency. With memory at 100 nanoseconds and NVMe faults at 100 microseconds, a page fault rate p of just 0.0001 (one fault per 10,000 accesses) adds 10 nanoseconds overhead, a 10% hit. For spinning disks (5 milliseconds), p must be under 0.00001 to keep overhead under 10%. In practice, production systems target zero major faults during steady state. Google's serving fleets explicitly warm caches on startup. Services pre touch heaps by writing to every page, sequentially scan hot datasets to populate the page cache, and exercise all code paths to fault in instructions and data. This drives major faults to effectively zero per second per process during normal operation, protecting tail latency. Amazon's latency sensitive EC2 services similarly run with swap minimized or disabled, ensuring the working set stays resident to avoid surprise major faults under load.
💡 Key Takeaways
Minor faults occur when a page is in the page cache but not mapped into the process. Cost is 1 to 10 microseconds. Common for file backed pages and COW pages before first write.
Major faults require disk I/O. NVMe costs 80 to 200 microseconds for 4 KB random reads. Spinning disks cost 5 to 10 milliseconds. Major faults are 1,000x to 100,000x slower than DRAM.
Effective Access Time degrades rapidly with page fault rate p. With 100 nanosecond DRAM and 100 microsecond faults, p equals 0.0001 adds 10% overhead. For millisecond faults, p must be under 0.00001.
Production systems target zero major faults during steady state. Google pre faults hot code and data during warmup. Even a handful of major faults per second can cause p99 latency spikes.
Lazy allocation and demand paging enable oversubscription and fast startup, but surprise faults under load destroy tail latency. Pre touch allocations and warm caches before serving traffic.
Amazon's latency sensitive services disable swap or strictly limit it. Any major fault under peak load risks violating SLOs, so working sets must stay resident in RAM.
📌 Examples
A Java service allocates a 10 GB heap but does not touch all pages. During a traffic spike, a request accesses a cold page, triggering a 150 microsecond major fault. The request p99 latency jumps from 5 ms to 15 ms, breaking SLO.
Google search serving processes pre fault the entire working set at startup by sequentially reading all hot datasets and calling all hot functions. This ensures zero major faults during query handling, keeping p99 latency under 50 ms.
A database with a 50 GB buffer pool on NVMe backed EC2 has swap enabled. Under memory pressure, the kernel evicts 10 GB to swap. A query scan triggers 10,000 major faults, each costing 100 microseconds, adding 1 second to query time.
← Back to Memory Management & Virtual Memory Overview