openshift – Paul Bastide

OpenShift… if you need a firewall

If your security posture requires a firewall, you can add it to your OpenShift cluster using the following:

Create a butane configuration

cat << EOF > 98-nftables-worker.bu
variant: openshift
version: 4.16.0
metadata:
  name: 98-nftables-worker
  labels:
    machineconfiguration.openshift.io/role: worker
systemd:
  units:
    - name: "nftables.service"
      enabled: true
      contents: |
        [Unit]
        Description=Netfilter Tables
        Documentation=man:nft(8)
        Wants=network-pre.target
        Before=network-pre.target
        [Service]
        Type=oneshot
        ProtectSystem=full
        ProtectHome=true
        ExecStart=/sbin/nft -f /etc/sysconfig/nftables.conf
        ExecReload=/sbin/nft -f /etc/sysconfig/nftables.conf
        ExecStop=/sbin/nft 'add table inet custom_table; delete table inet custom_table'
        RemainAfterExit=yes
        [Install]
        WantedBy=multi-user.target
storage:
  files:
  - path: /etc/sysconfig/nftables.conf
    mode: 0600
    overwrite: true
    contents:
      inline: |
        table inet custom_table
        delete table inet custom_table
        table inet custom_table {
            chain input {
                type filter hook input priority 0; policy accept;
                ip saddr 1.1.1.1/24 drop
            }
        }
EOF

Download butane

curl -o butane https://github.com/coreos/butane/releases/download/v0.23.0/butane-ppc64le-unknown-linux-gnu -L

Execute the butane

chmod +x butane; ./butane 98-nftables-worker.bu -o 98-nftables-worker.yaml

Run the nftables-worker.yaml butane

oc apply -f 98-nftables-worker.yaml

You can verify the workers drop the traffic.

Reference

OpenShift nftables.service https://access.redhat.com/articles/7090422

2025-04-25

Cool Feature… NodeDisruptionPolicies

I missed this feature in 4.17…. until I had to use it NodeDisruptionPolicies. If you are copying files over, you can avoid a MachineConfigPool reboot for files and services that depend on them. you can see more details Using node disruption policies to minimize disruption from machine config changes

apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
  name: cluster
spec:
  logLevel: Normal
  managementState: Managed
  operatorLogLevel: Normal
status:
  nodeDisruptionPolicyStatus:
    clusterPolicies:
      files:
      - actions:
        - type: None
        path: /etc/mco/internal-registry-pull-secret.json

Net… you can avoid a reboot when copying a file over/replacing a file and restarting a related service (already running).

FYI I ran across it with relations to nftables.service https://access.redhat.com/articles/7090422

2025-04-25

Red Hat OpenShift Container Platform on IBM Power Systems: Exploring Red Hat’s Multi-Arch Tuning Operator

The Red Hat Multi-Arch Tuning Operator optimizes workload placement within multi-architecture compute clusters. Pods run on the compute architecture for which the containers declare support. Where Operators, Deployments, ReplicaSets, Jobs, CronJob, Pods don’t declare a nodeAffinity, in most cases, the Pods that are generate are updated with the node affinity so it lands on the supported (declared) CPU Architecture.

For version 1.1.0, the Red Hat Multi-Arch Team, @Prashanth684, @aleskandro, @AnnaZivkovic and IBM Power Systems team @pkenchap have worked together to give cluster administrators better control and flexibility. The feature adds a plugins field in ClusterPodPlacementConfig and have build a first plugin called nodeAffinityScoring.

Per the docs, the nodeAffinityScoring plugin adds weights and influence to the scheduler with this process:

Analyzing the Pod’s containers for the supported architectures
Generate the Scheduling predicates for nodeAffinity, e.g., 75 weight on ppc64le
Filter out nodes that do not meet the Pod requirements, using the Predicates
Prioritizes the remaining nodes based on the architecture scores defined in the nodeAffinityScoring.platforms field.

To take advantages of this feature, use the following to asymmetrically load the Power nodes with work.

apiVersion: multiarch.openshift.io/v1beta1
kind: ClusterPodPlacementConfig
metadata:
  name: cluster
spec:
  logVerbosityLevel: Normal
  namespaceSelector:
    matchExpressions:
      - key: multiarch.openshift.io/exclude-pod-placement
        operator: Exists
  plugins:
    nodeAffinityScoring:
      enabled: true
      platforms:
        - architecture: ppc64le
          weight: 100
        - architecture: amd64
          weight: 50

Best wishes, and looking forward to hearing how you use the Multi-Arch Tuning Operator on IBM Power with Multi-Arch Compute.

References

2025-04-17

Setting up nx-gzip in a non-privileged container

*UPDATE: I also found I had to use the power-device-plugin*

In order to use nx-gzip on Power Systems with a non-privileged container, use the following recipe:

On each of the nodes, create the selinux power-nx-gzip.cil:

(block nx
    (blockinherit container)
    (allow process container_file_t ( chr_file ( map )))
)

Install the CIL on each worker node

sudo semodule -i power-nx-gzip.cil /usr/share/udica/templates/base_container.cil

I ran the following:

podman run -it --security-opt label=type:nx.process --device=/dev/crypto/nx-gzip registry.access.redhat.com/ubi9/ubi@sha256:a1804302f6f53e04cc1c6b20bc2204d5c9ae6e5a664174b38fbeeb30f7983d4e sh

I copied the files into the container using the container CONTAINER ID:

podman ps
podman cp temp 6a4d967f3b6b:/tmp
podman cp gzfht_test 6a4d967f3b6b:/tmp

Then running in the container:

sh-5.1# cd /tmp
sh-5.1# ./gzfht_test temp
file temp read, 1048576 bytes
compressed 1048576 to 1105922 bytes total, crc32 checksum = 3c56f054

You can use ausearch -m avc -ts recent | audit2allow to track down missing permissions

Hope this helps you…

Reference

https://github.com/libnxz/power-gzip

https://developers.redhat.com/articles/2025/04/11/my-advice-selinux-container-labeling

2025-04-09

Helpful Tool – mtr

I was not aware of the mtr which Network diagnostic tool combining 'traceroute' and 'ping'.

You can quickly install on RHEL/Centos with sudo dnf install -y mtr

The output is super helpful to where you have drops:

 mtr --report bastide.org
Start: 2025-04-04T12:44:43-0400
HOST: nx-gzip-d557-bastion-0.x Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 10.20.176.3                0.0%    10    1.9   1.2   0.9   1.9   0.3
  2.|-- 172.16.32.4                0.0%    10    0.7   0.7   0.7   0.8   0.0
  3.|-- att-vc-srx-interconnect.p  0.0%    10   30.2  33.9  25.4  62.6  11.0
  4.|-- XX.5.16.XXX                0.0%    10   11.8  11.8  11.7  12.0   0.1
  5.|-- po97.prv-leaf6a.net.unifi  0.0%    10   62.5  63.2  62.5  67.5   1.5

2025-04-04

Red Hat OpenShift Container Platform 4.18 Now Available on IBM Power

Red Hat OpenShift 4.18 Now Available on IBM Power Red Hat® OpenShift® 4.18 has been released and adds improvements and new capabilities to OpenShift Container Platform components. Based on Kubernetes 1.31 and CRI-O 1.31, Red Hat OpenShift 4.18 focused on core improvements with enhanced network flexibility.

You can download 4.18.1 from the mirror at https://mirror.openshift.com/pub/openshift-v4/multi/clients/ocp/4.18.1/ppc64le/

2025-02-27

Nest Accelerator and Urandom… I think

The NX accelerator has random number generation capabilities.

What what happens if the random-number entropy pool runs out of numbers? If you are reading from the /dev/random device, your application will block waiting for new numbers to be generated. Alternatively the urandom device is non-blocking, and will create random numbers on the fly, re-using some of the entropy in the pool. This can lead to numbers that are less random than required for some use cases.

Well, the Power9 and Power10 servers use the nest accelerator to generate the pseudo random numbers and maintains the pool.

Each processor chip in a Power9 and Power10 server has an on-chip “nest” accelerator called the NX unit that provides specialized functions for general data compression, gzip compression, encryption, and random number generation. These accelerators are used transparently across the systems software stack to speed up operations related to Live Partition Migration, IPSec, JFS2 Encrypted File Systems, PKCS11 encryption, and random number generation through /dev/random and /dev/urandom.

Kind of cool, I’ll have to find some more details to verify it and use it.

2025-02-09

Cool Plugin… kube-health

kube-health has a new release v0.3.0. I’ve been following along on this tool for a while.

Here’s why:

It allows you to poll a single resource and see if it’s OK… in the aggregate. You can see the status of subresources at the same time.
It’s super simple to watch the resource until it exits cleanly or fails…

Kudos to iNecas for a wonderful tool.

The following is an image from the github site.

2025-01-22

k8s-etcd-decryptor

I’m making a mental note that this tool from @simonkrenger k8s-etcd-decryptor is a life saver – I’ve used it once during development and need to get data out of etcd.

The tool decrypts the AES-CBC-encrypted objects from etcd. Note, AES-CBC is one of two encyrption types AES-GCM, and is not covered by the tool.

You can read more about encryption in OpenShift at Chapter 15. Encrypting etcd data

2025-01-17

Coming to Grips with Linux Pressure Stall Information

The Linux Pressure Stall Information, as part of the Control Group v2, provides an accurate accounting of a containers cpu, memory and io. The psi stats allow accurate and limited access to resources – no over-committing and no over-sizing.

However, it sometimes is difficult to see if the a container is being limited and could use more resources assigned.

This article is designed to help you diagnose and check your pods so you can get the best out of your workloads.

Check your workload

You can check the container in your Pod’s cpu.stat:

Find the containerId

[root@cpi-c7b2-bastion-0 ~]# oc get pod -n test test-pod -oyaml | grep -i containerID
  - containerID: cri-o://c050804396004e6b5d822541a58f299ea2b0e48936709175d6d57f3507cc6cea

Connect into the Pod.

[root@cpi-c7b2-bastion-0 ~]# oc rsh -n test test-pod
sh-4.4# find /sys -iname '*c050804396004e6b5d822541a58f299ea2b0e48936709175d6d57f3507cc6cea*'
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0d4b90d9_20f9_427d_9414_9964f32379dc.slice/crio-c050804396004e6b5d822541a58f299ea2b0e48936709175d6d57f3507cc6cea.scope
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0d4b90d9_20f9_427d_9414_9964f32379dc.slice/crio-conmon-c050804396004e6b5d822541a58f299ea2b0e48936709175d6d57f3507cc6cea.scope

Check the cpu.stat or io.stat or memory.stat.

/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0d4b90d9_20f9_427d_9414_9964f32379dc.slice/crio-conmon-c050804396004e6b5d822541a58f299ea2b0e48936709175d6d57f3507cc6cea.scope/cpu.stat
usage_usec 11628232854
user_usec 8689145332
system_usec 2939087521
core_sched.force_idle_usec 0
nr_periods 340955
nr_throttled 8
throttled_usec 8012
nr_bursts 0
burst_usec 0

We can see that the cpu is being throttled in nr_throttled and throttled_usec. This is really a minor impact for a container.

nr_throttled 8
throttled_usec 8012

If the container had a higher number of throttled events, you want to check the number of cpus or memory that your container is limited to, such as:

nr_throttled 103
throttled_usec 22929315

Check the container limits.

❯ NS=test
❯ POD=test-pod
❯ oc get -n ${NS} pod ${POD} -ojson | jq -r '.spec.containers[].resources.limits.cpu'
8

Patch your Pod or update your application to increase the cpus.

Checking real-time stats

You can check the real-time stats top for your container pressure. Log on to your host.

find /sys/fs/cgroup/kubepods.slice/ -iname cpu.pressure  | xargs -t -I {} cat {} | grep -v total=0
find /sys/fs/cgroup/kubepods.slice/ -iname memory.pressure  | xargs -t -I {} cat {} | grep -v total=0
find /sys/fs/cgroup/kubepods.slice/ -iname io.pressure  | xargs -t -I {} cat {} | grep -v total=0

This will show you all the pods that are under pressure.

for PRESSURE in $( find /sys/fs/cgroup/kubepods.slice/ -iname io.pressure)
do
    if [ ! -z "$(cat ${PRESSURE} | grep -v total=0)" ]
    then
        if [ ! -z "$(cat ${PRESSURE} | grep -v "avg10=0.00 avg60=0.00 avg300=0.00")" ]
        then
            echo ${PRESSURE}
        fi
    fi
done

❯ cat /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podde03ef16_000a_4198_9e04_ac96d0ea33c5.slice/crio-d200161683a680588c4de8346ff58d633201eae2ffd558c8d707c4836215645e.scope/io.pressure
some avg10=14.02 avg60=14.16 avg300=13.99 total=4121355556
full avg10=14.02 avg60=14.16 avg300=13.99 total=4121050788

In this case, I was able to go in and icnrease the total IO.

Tweak

You can tweak the cpu.pressure settings temporarily for a pod or system so the time used to evaluate is extended (this is the longest time possible).

The maximum window size is 10 seconds, and if you have kernel version less than 6.5 then the minimum window size is 500ms.

cat << EOF > /sys/fs/cgroup/cpu.pressure
some 10000000 10000000
full 10000000 10000000
EOF

Disabling psi in OpenShift

There are two methods to disable psi in OpenShift, the first is to set a kernel parameter, and the second is to switch from cgroupsv2 to cgroups.

Switch from cgroupsv2 to cgroups

You can switch from cgroupsv2 to cgroups – Configuring the Linux cgroup version on your nodes.

❯ oc patch nodes.config cluster --type merge -p '{"spec": {"cgroupMode": "v1"}}'

You’ll have to wait for each of the Nodes to restart.

Set the Kernel Parameter psi=0

In OpenShift, you can disable psi in using a MachineConfig.

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 99-worker-psi-disable
spec:
  kernelArguments:
  - psi=0

Check psi is enabled

You can check to see if it is enabled by checking one of the cpu.pressure, io.pressure or memory.pressure files. You’ll see “Operation not supported”.

sh-5.1# cat /sys/fs/cgroup/cpu.pressure
cat: /sys/fs/cgroup/cpu.pressure: Operation not supported

or

oc debug node/<node_name>
chroot /host
stat -c %T -f /sys/fs/cgroup
tmpfs

Summary

Linux PSI is pretty awesome. However, you should check your workload and verify it’s running correctly.

References

2024-09-13

Tag: openshift

Reference

References

Check your workload

Checking real-time stats

Tweak

Disabling psi in OpenShift

Switch from cgroupsv2 to cgroups

Set the Kernel Parameter psi=0

Check psi is enabled

Summary

References