Common Context Switching Failure Modes at Scale

Lock Convoy
Lock convoy happens when many threads compete for one lock. A thread acquires the lock, gets preempted while holding it, and all other threads pile up waiting. When the holder runs again and releases, another thread acquires, gets preempted, and the cycle continues.
The result is serialization. Even with many cores, only one thread progresses. Context switches multiply because waiting threads keep waking and blocking. Throughput collapses to worse than single threaded. Detect by monitoring lock hold times and contention counts.
Priority Inversion
Priority inversion occurs when a high priority thread waits for a lock held by a low priority thread. The low priority thread cannot run because medium priority threads preempt it. The high priority thread effectively runs at lower priority than medium threads.
Solutions include priority inheritance: temporarily boost lock holder to highest waiter priority. This ensures the holder completes quickly. Alternatively, use lock free data structures or redesign to eliminate the problematic lock.
Runaway Thread Count
Too many runnable threads overwhelms the scheduler. With 1000 threads on 32 cores, each thread gets tiny time slices. Context switches dominate. Cache thrashing destroys performance. The scheduler itself becomes a bottleneck at very high thread counts.
Symptoms include high context switch rate (tens of thousands per second), high CPU sys time, and low actual throughput despite high CPU utilization. The fix is reducing thread count to match core count or using thread pools with bounded size.
🎯 When To Use: Size thread pools to core count for CPU bound work. Use async I/O and event loops for I/O bound work with high connection counts. Monitor context switch rate: if exceeding 10K per second per core, investigate thread count and locking.

💡 Key Takeaways

✓Lock convoy: preempted lock holder causes all waiters to pile up, serializing execution

✓Priority inversion: high priority thread blocked by low priority holder preempted by medium priority

✓Priority inheritance boosts holder to highest waiter priority to resolve inversion

✓Too many threads causes tiny slices, cache thrashing, and scheduler overhead

✓Target context switch rate under 10K per second per core for healthy performance

📌 Interview Tips

1When debugging low throughput despite high CPU, check for lock convoy: many threads waiting on one lock

2Explain priority inversion with the classic scenario: high priority waits for low priority which cannot run because medium priority preempts

3If context switch rate is very high, recommend reducing thread count or switching to async I/O model

← Back to CPU Scheduling & Context Switching Overview