Kubernetes CPU Throttling and CFS Bandwidth Control

CFS Bandwidth Control
CFS bandwidth control limits CPU time for a cgroup over a period. The period is typically 100 milliseconds. The quota is how much CPU time the cgroup can use in that period. A quota of 200 milliseconds per 100 millisecond period allows 2 CPUs worth of time.
When quota exhausts, all threads in the cgroup throttle: they cannot run until the next period. This creates the throttling pattern: burst of execution, then complete stop, then burst again. Latency becomes unpredictable because requests may arrive during throttle periods.
Kubernetes CPU Limits
Kubernetes uses CFS bandwidth control for CPU limits. A limit of 2 cores sets quota to 200 milliseconds per 100 millisecond period. If your pod bursts and exhausts quota in 50 milliseconds, it throttles for the remaining 50 milliseconds. P99 latency spikes.
CPU requests work differently. Requests affect scheduling: the pod is placed on nodes with sufficient available CPU. Requests also set CFS weight, affecting priority when competing for CPU. But requests do not throttle. A pod with 1 core request can burst to use all available CPU if no limits are set.
Detecting and Mitigating Throttling
Monitor nr_throttled and throttled_time from cgroup stats. High throttle counts indicate CPU limits are too tight. Either increase limits or reduce workload CPU consumption.
Some organizations remove CPU limits entirely, relying on requests for scheduling and fair share for competition. This prevents throttling but allows noisy neighbors. The trade-off is predictable limits versus maximum throughput. For latency sensitive services, consider removing limits and using request based scheduling.
✅ Best Practice: Set CPU requests to actual steady state usage. Set limits 20 to 50 percent higher than requests to allow burst. Monitor throttling metrics. If throttling correlates with latency spikes, increase limits or consider removing them for critical services.

💡 Key Takeaways

✓CFS bandwidth: quota milliseconds per period; exceeding quota causes throttling

✓Kubernetes CPU limits use CFS bandwidth; requests affect scheduling and CFS weight

✓Throttling creates stop start pattern: execute then block until next period

✓Monitor nr_throttled and throttled_time to detect CPU limit problems

✓Consider removing limits for latency critical services; rely on requests for fair share

📌 Interview Tips

1Explain throttling: 200ms quota per 100ms period means 2 cores worth; using it in 50ms means 50ms throttled wait

2When debugging latency spikes, check CPU throttling metrics first. Throttling during request processing causes P99 spikes

3Recommend setting requests to steady state and limits 20 to 50 percent higher to allow burst without constant throttling

← Back to CPU Scheduling & Context Switching Overview