OS & Systems FundamentalsCPU Scheduling & Context SwitchingHard⏱️ ~3 min

CPU Affinity, Core Pinning, and NUMA Awareness

CPU affinity controls which CPU cores a thread is allowed to run on. By default, the Linux scheduler can migrate threads across any core to balance load. Pinning a thread to a specific core or set of cores prevents migrations, preserving cache and Non Uniform Memory Access (NUMA) locality. Modern servers have NUMA architectures where memory is physically attached to CPU sockets. Accessing local memory takes 50 to 100 nanoseconds while remote memory access across sockets takes 100 to 300 nanoseconds, a 2x to 3x penalty. Core pinning delivers measurable latency improvements in latency sensitive systems. When a thread stays on the same core, its working set remains in L1 and L2 cache (1 to 10 nanoseconds access time). Migrations to another core cause cache misses that fetch from L3 cache (20 to 50 nanoseconds) or main memory. For threads with hot data structures, migrations can add 5 to 15 microseconds to critical path latencies. Low latency trading systems pin their critical threads to isolated cores and see p99 latencies drop from 20 to 50 microseconds to single digit microseconds. The tradeoff is flexibility and utilization. Pinned threads cannot move to idle cores, potentially leaving CPU capacity unused while pinned cores are saturated. In dynamic workloads, rigid pinning can create imbalances. The scheduler's load balancing exists for a reason: it improves overall throughput by using all available cores. Pinning is a latency optimization at the cost of throughput and operational complexity. NUMA awareness extends this concept to memory placement. Use numactl or similar tools to allocate memory on the same socket as the threads that access it. This is critical for memory intensive workloads like databases and caches. A Redis instance with memory on a remote socket can see 30 to 50 percent throughput degradation and double digit millisecond p99 latency increases due to remote memory accesses. Combining CPU pinning with NUMA local memory allocation keeps both instructions and data local, minimizing access latencies across the board.
💡 Key Takeaways
Core pinning prevents migrations, preserving L1 and L2 cache (1 to 10 nanoseconds access) vs post migration cache misses (20 to 300 nanoseconds from L3 or memory)
NUMA remote memory access takes 100 to 300 nanoseconds vs 50 to 100 nanoseconds for local memory, a 2x to 3x penalty that compounds in memory intensive workloads
Low latency systems pin threads to isolated cores with nohz_full, achieving single digit microsecond p99 latencies vs 20 to 50 microseconds with default scheduler
Tradeoff: Pinning sacrifices load balancing flexibility and can create imbalanced utilization, potentially leaving cores idle while pinned cores saturate
Redis or database instances with remote NUMA memory suffer 30 to 50 percent throughput loss and p99 latency increases of tens of milliseconds due to cross socket memory traffic
📌 Examples
A trading system pins critical pricing threads to cores 0 to 3 with memory on socket 0, achieving 5 microsecond p99 latency vs 35 microseconds without pinning due to eliminated migrations and cache misses
PostgreSQL configured with NUMA awareness keeps connection processes and shared memory on the same socket, improving query throughput by 40 percent on a dual socket server
Kubernetes does not provide NUMA awareness by default; a Redis pod scheduled with memory on remote socket sees p99 latency increase from 2 milliseconds to 8 milliseconds under load
← Back to CPU Scheduling & Context Switching Overview