The Linux Pressure Stall Information, as part of the Control Group v2, provides an accurate accounting of a containers cpu, memory and io. The psi
stats allow accurate and limited access to resources – no over-committing and no over-sizing.
However, it sometimes is difficult to see if the a container is being limited and could use more resources assigned.
This article is designed to help you diagnose and check your pods so you can get the best out of your workloads.
Check your workload
You can check the container in your Pod’s cpu.stat:
- Find the containerId
[root@cpi-c7b2-bastion-0 ~]# oc get pod -n test test-pod -oyaml | grep -i containerID
- containerID: cri-o://c050804396004e6b5d822541a58f299ea2b0e48936709175d6d57f3507cc6cea
- Connect into the Pod.
[root@cpi-c7b2-bastion-0 ~]# oc rsh -n test test-pod
sh-4.4# find /sys -iname '*c050804396004e6b5d822541a58f299ea2b0e48936709175d6d57f3507cc6cea*'
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0d4b90d9_20f9_427d_9414_9964f32379dc.slice/crio-c050804396004e6b5d822541a58f299ea2b0e48936709175d6d57f3507cc6cea.scope
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0d4b90d9_20f9_427d_9414_9964f32379dc.slice/crio-conmon-c050804396004e6b5d822541a58f299ea2b0e48936709175d6d57f3507cc6cea.scope
- Check the cpu.stat or io.stat or memory.stat.
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0d4b90d9_20f9_427d_9414_9964f32379dc.slice/crio-conmon-c050804396004e6b5d822541a58f299ea2b0e48936709175d6d57f3507cc6cea.scope/cpu.stat
usage_usec 11628232854
user_usec 8689145332
system_usec 2939087521
core_sched.force_idle_usec 0
nr_periods 340955
nr_throttled 8
throttled_usec 8012
nr_bursts 0
burst_usec 0
- We can see that the cpu is being throttled in
nr_throttled
andthrottled_usec
. This is really a minor impact for a container.
nr_throttled 8
throttled_usec 8012
If the container had a higher number of throttled events, you want to check the number of cpus or memory that your container is limited to, such as:
nr_throttled 103
throttled_usec 22929315
- Check the container limits.
❯ NS=test
❯ POD=test-pod
❯ oc get -n ${NS} pod ${POD} -ojson | jq -r '.spec.containers[].resources.limits.cpu'
8
- Patch your Pod or update your application to increase the cpus.
Checking real-time stats
You can check the real-time stats top
for your container pressure. Log on to your host.
find /sys/fs/cgroup/kubepods.slice/ -iname cpu.pressure | xargs -t -I {} cat {} | grep -v total=0
find /sys/fs/cgroup/kubepods.slice/ -iname memory.pressure | xargs -t -I {} cat {} | grep -v total=0
find /sys/fs/cgroup/kubepods.slice/ -iname io.pressure | xargs -t -I {} cat {} | grep -v total=0
This will show you all the pods that are under pressure.
for PRESSURE in $( find /sys/fs/cgroup/kubepods.slice/ -iname io.pressure)
do
if [ ! -z "$(cat ${PRESSURE} | grep -v total=0)" ]
then
if [ ! -z "$(cat ${PRESSURE} | grep -v "avg10=0.00 avg60=0.00 avg300=0.00")" ]
then
echo ${PRESSURE}
fi
fi
done
❯ cat /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podde03ef16_000a_4198_9e04_ac96d0ea33c5.slice/crio-d200161683a680588c4de8346ff58d633201eae2ffd558c8d707c4836215645e.scope/io.pressure
some avg10=14.02 avg60=14.16 avg300=13.99 total=4121355556
full avg10=14.02 avg60=14.16 avg300=13.99 total=4121050788
In this case, I was able to go in and icnrease the total IO.
Tweak
You can tweak the cpu.pressure settings temporarily for a pod or system so the time used to evaluate is extended (this is the longest time possible).
The maximum window size is 10 seconds, and if you have kernel version less than 6.5 then the minimum window size is 500ms.
cat << EOF > /sys/fs/cgroup/cpu.pressure
some 10000000 10000000
full 10000000 10000000
EOF
Disabling psi in OpenShift
There are two methods to disable psi
in OpenShift, the first is to set a kernel parameter, and the second is to switch from cgroupsv2 to cgroups.
Switch from cgroupsv2 to cgroups
You can switch from cgroupsv2 to cgroups – Configuring the Linux cgroup version on your nodes.
❯ oc patch nodes.config cluster --type merge -p '{"spec": {"cgroupMode": "v1"}}'
You’ll have to wait for each of the Nodes to restart.
Set the Kernel Parameter psi=0
In OpenShift, you can disable psi
in using a MachineConfig
.
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: 99-worker-psi-disable
spec:
kernelArguments:
- psi=0
Check psi is enabled
You can check to see if it is enabled by checking one of the cpu.pressure, io.pressure or memory.pressure files. You’ll see “Operation not supported”.
sh-5.1# cat /sys/fs/cgroup/cpu.pressure
cat: /sys/fs/cgroup/cpu.pressure: Operation not supported
or
oc debug node/<node_name>
chroot /host
stat -c %T -f /sys/fs/cgroup
tmpfs
Summary
Linux PSI is pretty awesome. However, you should check your workload and verify it’s running correctly.
References
- kernel.org: Control Group v2
- kernel.org: PSI – Pressure Stall Information
- Linux Source:
kernel/sched/psi.c
- Blog: Linux Pressure Stall Information (PSI) by Example
- lwn.net: psi: pressure stall information for CPU, memory, and IO v2
- Facebook OpenSource: Getting Started with PSI
- Facebook OpenSource: fbtax2: Putting it All Together
- OpenShift: Configuring the Linux cgroup version on your nodes
Leave a Reply