Kubernetes CPU Throttling and CFS Bandwidth Control
CFS Bandwidth Control
CFS bandwidth control limits CPU time for a cgroup over a period. The period is typically 100 milliseconds. The quota is how much CPU time the cgroup can use in that period. A quota of 200 milliseconds per 100 millisecond period allows 2 CPUs worth of time.
When quota exhausts, all threads in the cgroup throttle: they cannot run until the next period. This creates the throttling pattern: burst of execution, then complete stop, then burst again. Latency becomes unpredictable because requests may arrive during throttle periods.
Kubernetes CPU Limits
Kubernetes uses CFS bandwidth control for CPU limits. A limit of 2 cores sets quota to 200 milliseconds per 100 millisecond period. If your pod bursts and exhausts quota in 50 milliseconds, it throttles for the remaining 50 milliseconds. P99 latency spikes.
CPU requests work differently. Requests affect scheduling: the pod is placed on nodes with sufficient available CPU. Requests also set CFS weight, affecting priority when competing for CPU. But requests do not throttle. A pod with 1 core request can burst to use all available CPU if no limits are set.
Detecting and Mitigating Throttling
Monitor nr_throttled and throttled_time from cgroup stats. High throttle counts indicate CPU limits are too tight. Either increase limits or reduce workload CPU consumption.
Some organizations remove CPU limits entirely, relying on requests for scheduling and fair share for competition. This prevents throttling but allows noisy neighbors. The trade-off is predictable limits versus maximum throughput. For latency sensitive services, consider removing limits and using request based scheduling.