OS & Systems FundamentalsCPU Scheduling & Context SwitchingHard⏱️ ~3 min

Common Context Switching Failure Modes at Scale

Thundering herd occurs when many threads block on the same event (for example, accept on a listening socket or epoll_wait) and all wake simultaneously when the event occurs. This saturates the runqueue with hundreds or thousands of newly runnable threads, causing a context switch storm. All threads compete for the same lock or resource, most fail and block again, but not before each has consumed microseconds of CPU time and polluted caches. Symptoms include spiky CPU usage and millisecond scale latency bumps during connection bursts. Mitigations include accept mutexes, SO_REUSEPORT to distribute accept across multiple sockets, and per core event loops. Priority inversion is a subtle failure where a low priority thread holds a lock needed by a high priority thread. If medium priority runnable threads exist, the scheduler runs them instead of the low priority thread, preventing it from releasing the lock. The high priority thread is effectively blocked by lower priority work. This manifests as unexplained stalls under load that disappear when load decreases. Real time systems use priority inheritance protocols where a thread holding a lock temporarily inherits the priority of threads waiting for that lock, ensuring it runs and releases the lock quickly. CFS bandwidth throttling creates hard latency cliffs. When a containerized service hits its CPU quota, it is throttled until the end of the 100 millisecond period (by default). A request that arrives 5 milliseconds before period end and triggers quota exhaustion will stall for 5 milliseconds. This shows up as a p99 latency spike exactly equal to the remaining period time. Many teams have debugged mysterious 10 to 100 millisecond tail latencies only to discover CPU throttling as the root cause. The fix is removing limits, increasing them with large headroom, or reducing the period to 1 millisecond to cap stall duration. Runqueue contention at high core counts becomes problematic on systems with hundreds of cores and thousands of threads. Even with per core runqueues, the scheduler must periodically scan and balance load, acquiring locks and triggering inter processor interrupts. At extreme scale, lock contention in the scheduler itself adds microseconds to scheduling decisions, and cross core migrations trigger TLB shootdowns that stall pipelines. Systems with 100+ cores often require careful tuning of scheduler domains and load balance intervals to prevent scheduler overhead from exceeding 5 percent of CPU time.
💡 Key Takeaways
Thundering herd wakes hundreds of threads simultaneously on shared events, causing context switch storms and cache thrash that add milliseconds to latency during traffic bursts
Priority inversion occurs when low priority thread holds lock needed by high priority thread, causing unbounded delays when medium priority work starves the low priority thread
CFS bandwidth throttling causes p99 latency spikes equal to remaining quota period (up to 100 milliseconds by default) when containers hit CPU limits during bursts
Runqueue contention on 100+ core systems causes scheduler overhead to exceed 5 percent of CPU as lock contention and load balancing cross core traffic increases quadratically
NUMA migrations cause remote memory accesses that are 2x to 3x slower, inflating p99 latencies by tens of milliseconds in memory intensive services when scheduler migrates threads across sockets
📌 Examples
A web server using epoll with 1000 threads on one epoll fd sees thundering herd on accept: all 1000 threads wake, 999 fail to accept and block again, wasting 10 to 50 milliseconds of CPU during connection bursts
A real time control system experiences 100 millisecond stalls under load due to priority inversion: low priority logging thread holds file lock while high priority control thread waits, medium priority threads prevent logger from running
A Kubernetes service with 2 CPU limit hits quota 10 milliseconds into the 100 millisecond period during traffic spike, causing 90 millisecond throttle stall visible as p99 latency spike from 20 milliseconds to 110 milliseconds
← Back to CPU Scheduling & Context Switching Overview