OpenShift Container Platform and CGroups: Notes

My notes from OCP/Cgroups debugging and usage.

What is attaching the BPF program to my cgroup?

When you create a Pod, the API Server reconciles the resource, and the Kube Scheduler is triggered to assign it to a Node. On the Node, the Kubelet converts to the OCI specification, enriches the container with host-device specific resources, and dispatches it to cri-o. cri-o, using the default container runtime launcher – runc or crun, and using the runc/crun configuration it launches and manages the container with SystemD, and attaches an eBPF program that controls device access.

If you are seeing EPERM issues accessing a device, perhaps you don’t have the right access set at the Pod level, you may be able to use a Device Plugin.

Options for adding Devices

You have a couple of things to look at:

  1. volumeDevices
  2. io.kubernetes.cri-o.Devices
  3. cri-o config drop-in
  4. crun or runc with DeviceAllow https://github.com/containers/crun https://github.com/containers/crun/blob/017b5fddcb0a29938295d9a28fdc901164c77d74/contrib/seccomp-notify-plugin-rust/src/mknod.rs#L9
  5. A custom device plugin like https://github.com/IBM/power-device-plugin

Note, it give R/W to the full device.

Requires selinux-relabeling to be disabled

You may need to stop selinux from relabeling the files when you run as randomized ids. The cloud pak describes an excelent way to disable selinux relabeling: https://www.ibm.com/docs/en/cloud-paks/cp-data/5.0.x?topic=1-disabling-selinux-relabeling

You can confirm the file details using:

sh-5.1$ ls -alZ /mnt/example/myfile.log
-rw-r--r--. 1 xuser wheel system_u:object_r:container_file_t:s0 1053201 Dec 11 19:45 /mnt/example/myfile.log

Switching Container Runtime Launchers

You can switch your Container Runtime from runc to crun using:

cat << EOF | oc apply -f -
apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
 name: container-crun
spec:
 machineConfigPoolSelector:
   matchLabels:
     pools.operator.machineconfiguration.openshift.io/worker: '' 
 containerRuntimeConfig:
   logLevel: debug 
   overlaySize: 1G 
   defaultRuntime: "crun"
EOF

container_use_devices

Allows containers to use any device volume mounted into container, see https://github.com/containers/container-selinux/blob/main/container.te#L39

$ getsebool -a | grep container_use_devices
container_use_devices --> off

More details on creating a MachineConfig is at https://docs.openshift.com/container-platform/4.16/networking/multiple_networks/configuring-additional-network.html

blktrace

blktrace is a superb tool. You’ll just have to put the kernel in debug mode.

blktrace -d /dev/sdf

We also built a crio config script.

https://www.redhat.com/en/blog/open-container-initiative-hooks-admission-control-podman

https://www.redhat.com/en/blog/extending-the-runtime-functionality