I\'m experiencing a strange issue when using CPU Requests/Limits in Kubernetes. Prior to setting any CPU Requests/Limits at all, all my service
Kubernetes uses (Completely Fair Scheduler) CFS quota to enforce CPU limits on pod containers. See "How does the CPU Manager work" described in https://kubernetes.io/blog/2018/07/24/feature-highlight-cpu-manager/ for further details.
The CFS is a Linux feature, added with the 2.6.23 kernel, which is based on two parameters: cpu.cfs_period_us and cpu.cfs_quota To visualize these two parameters, I'd like to borrow the following picture from Daniele Polencic from his excellent blog (https://twitter.com/danielepolencic/status/1267745860256841731):
If you configure a CPU limit in K8s it will set period and quota. If a process running in a container reaches the limit it is preempted and has to wait for the next period. It is throttled. So this is the effect, which you are experiencing. The period and quota algorithm should not be considered to be a CPU limit, where processes are unthrottled, if not reached. The behavior is confusing, and also a K8s issue exist for this: https://github.com/kubernetes/kubernetes/issues/67577 The recommendation given in https://github.com/kubernetes/kubernetes/issues/51135 is to not set CPU limits for pods that shouldn't be throttled.
If you see the documentation you see when you issue a Request
for CPUs it actually uses the --cpu-shares option in Docker which actually uses the cpu.shares attribute for the cpu,cpuacct cgroup on Linux. So a value of 50m
is about --cpu-shares=51
based on the maximum being 1024
. 1024
represents 100% of the shares, so 51
would be 4-5% of the share. That's pretty low, to begin with. But the important factor here is that this relative to how many pods/container you have on your system and what cpu-shares those have (are they using the default).
So let's say that on your node you have another pod/container with 1024 shares which is the default and you have this pod/container with 4-5 shares. Then this container will get about get about 0.5% CPU, while the other pod/container will get about 99.5% of the CPU (if it has not limits). So again it all depends on how many pods/container you have on the node and what their shares are.
Also, not very well documented in the Kubernetes docs, but if you use Limit
on a pod it's basically using two flags in Docker: --cpu-period and --cpu--quota which actually uses the cpu.cfs_period_us and the cpu.cfs_quota_us attributes for the cpu,cpuacct cgroup on Linux. This was introduced to the fact that cpu.shares didn't provide a limit so you'd spill over cases where containers would grab most of the CPU.
So, as far as this limit is concerned you will never hit it if you have other containers on the same node that don't have limits (or higher limits) but have a higher cpu.shares because they will end up optimizing and picking idle CPU. This could be what you are seeing, but again depends on your specific case.
A longer explanation for all of the above here.