OS & Systems Fundamentals • Memory Management & Virtual MemoryEasy⏱️ ~3 min
Virtual Memory Fundamentals: Address Translation & Process Isolation
Virtual memory decouples what a process sees from physical RAM chips. Every process gets its own large, contiguous virtual address space (typically 48 bits on x64, providing 256 TB of addressable space), but only a small fraction lives in physical RAM at any moment. A hardware component called the Memory Management Unit (MMU) translates every virtual address to a physical frame using page tables.
This architecture enables powerful capabilities. First, process isolation: each process has its own page tables, so one process cannot accidentally or maliciously access another's memory. Second, efficient memory sharing: the operating system can map the same physical page (like a shared library such as libc) into multiple processes' virtual address spaces, saving gigabytes. Chrome exploits this by starting a zygote process with preloaded libraries, then forking child processes for tabs. These children inherit shared code mappings via Copy On Write (COW), reducing startup latency and memory footprint across hundreds of processes.
The working set is the subset of pages a process actively uses. The operating system keeps only this subset resident in RAM; the rest is backed by disk (swap files or memory mapped files) and fetched on demand. This allows systems to run programs whose total memory exceeds physical RAM, trading some page fault overhead for much higher utilization.
In production, Google's serving fleets pre fault hot code and data during warmup to drive major page faults to effectively zero during steady state. Even a handful of major faults per second (requiring disk I/O) can cause p99 latency spikes of 100 to 500 microseconds on NVMe storage, or 5 to 10 milliseconds on spinning disks.
💡 Key Takeaways
•Virtual addresses decouple process view from physical RAM. On x64, processes see 256 TB virtual space (48 bits), but only gigabytes are typically resident in physical memory.
•The MMU translates every memory access using page tables. A Translation Lookaside Buffer (TLB) caches recent translations to avoid expensive page table walks that cost 100 to 200 nanoseconds and thousands of CPU cycles.
•Working set is the subset of pages actively used. Operating systems keep only this in RAM, fetching other pages from disk on demand (demand paging). This enables oversubscription but requires careful management.
•Copy On Write (COW) enables cheap process forking and memory sharing. Parent and child initially share physical pages marked read only; writes trigger a copy, deferring allocation until necessary.
•Major page faults require disk I/O and cost 100 to 500 microseconds on NVMe or 5 to 10 milliseconds on HDD, versus 60 to 100 nanoseconds for DRAM hits. Production systems target zero major faults during steady state.
•Chrome uses a zygote process model: preload common libraries once, fork child processes that inherit shared mappings via COW, saving gigabytes across hundreds of tab processes.
📌 Examples
Google serving fleets pre fault hot code and data at startup, touching every page in the working set during warmup to ensure zero major page faults during request handling, protecting p99 latency.
Chrome's zygote process preloads libc, V8 engine, and rendering libraries (hundreds of MB), then forks child processes for each tab. Shared code pages are mapped read only into all children, saving 80%+ memory compared to independent processes.
A database process with 10 GB heap might have only 2 GB resident (20% working set) if most data is cold. Accessing a cold page triggers a major fault, stalling the thread for 100+ microseconds while the page is read from disk.