OS & Systems Fundamentals • CPU Scheduling & Context SwitchingEasy⏱️ ~2 min
What is CPU Scheduling and Context Switching?
CPU scheduling is the operating system mechanism that decides which runnable thread executes next on each CPU core. Modern schedulers are preemptive, meaning they divide CPU time into slices and can interrupt a running thread when its slice expires or when a higher priority thread arrives. This ensures fairness and responsiveness but introduces overhead.
A context switch is the act of stopping one thread and starting another. The kernel must save the outgoing thread's entire execution state (program counter, CPU registers, stack pointer, flags, and floating point state) into memory, then restore the incoming thread's state from memory. If switching between different processes rather than threads in the same process, the memory address space must also be changed, which flushes the Translation Lookaside Buffer (TLB) and causes additional overhead.
The cost of context switching is pure overhead where no useful work happens. On modern x86_64 processors, switching between threads in the same process takes 0.3 to 1.0 microseconds. Switching between different processes with address space changes costs 1 to 5 microseconds. Migrating a thread to a different CPU core adds 5 to 15 microseconds due to cache misses and inter processor interrupts. These numbers seem small, but in high throughput systems handling hundreds of thousands of requests per second, microseconds multiply quickly.
The indirect costs often matter more than direct switching time. When a thread moves to a different core, its cached data is no longer in that core's L1/L2 cache, causing cache misses that take 50 to 200 nanoseconds each. In systems with thousands of context switches per second, CPU time spent in the scheduler itself plus cache pollution can consume several percent of total CPU capacity.
💡 Key Takeaways
•Preemptive schedulers time slice CPU among runnable threads, typically giving each thread 0.75 to 6 milliseconds before preempting
•Same process thread switch costs 0.3 to 1.0 microseconds, cross process switch costs 1 to 5 microseconds, cross core migration adds 5 to 15 microseconds
•Context switch saves outgoing thread state (program counter, registers, stack pointer, floating point state) and restores incoming thread state
•Indirect costs dominate at scale: cache misses take 50 to 200 nanoseconds each, TLB misses cause pipeline stalls, and scheduler overhead consumes several percent of CPU under heavy load
•Systems with thousands of runnable threads per core spend more time context switching than doing useful work, inflating tail latencies by tens of milliseconds
📌 Examples
Netflix migrated Java API gateways from thread per request (thousands of threads) to event driven (tens of threads), reducing context switches and improving p99 latency by double digit percentages at hundreds of thousands of requests per second
Low latency trading systems pin critical threads to isolated CPU cores with nohz_full, achieving single digit microsecond end to end latency by eliminating scheduler induced context switches entirely