Application Development

Red Hat OpenShift Container Platform on IBM Power Systems: Exploring Red Hat’s Multi-Arch Tuning Operator

The Red Hat Multi-Arch Tuning Operator optimizes workload placement within multi-architecture compute clusters. Pods run on the compute architecture for which the containers declare support. Where Operators, Deployments, ReplicaSets, Jobs, CronJob, Pods don’t declare a nodeAffinity, in most cases, the Pods that are generate are updated with the node affinity so it lands on the supported (declared) CPU Architecture.

For version 1.1.0, the Red Hat Multi-Arch Team, @Prashanth684, @aleskandro, @AnnaZivkovic and IBM Power Systems team @pkenchap have worked together to give cluster administrators better control and flexibility. The feature adds a plugins field in ClusterPodPlacementConfig and have build a first plugin called nodeAffinityScoring.

Per the docs, the nodeAffinityScoring plugin adds weights and influence to the scheduler with this process:

Analyzing the Pod’s containers for the supported architectures
Generate the Scheduling predicates for nodeAffinity, e.g., 75 weight on ppc64le
Filter out nodes that do not meet the Pod requirements, using the Predicates
Prioritizes the remaining nodes based on the architecture scores defined in the nodeAffinityScoring.platforms field.

To take advantages of this feature, use the following to asymmetrically load the Power nodes with work.

apiVersion: multiarch.openshift.io/v1beta1
kind: ClusterPodPlacementConfig
metadata:
  name: cluster
spec:
  logVerbosityLevel: Normal
  namespaceSelector:
    matchExpressions:
      - key: multiarch.openshift.io/exclude-pod-placement
        operator: Exists
  plugins:
    nodeAffinityScoring:
      enabled: true
      platforms:
        - architecture: ppc64le
          weight: 100
        - architecture: amd64
          weight: 50

Best wishes, and looking forward to hearing how you use the Multi-Arch Tuning Operator on IBM Power with Multi-Arch Compute.

References

2025-04-17

nx-gzip requires active_mem_expansion_capable

nx-gzip requires the licensed process caability active_mem_expansion_capable

Login to your HMC

for MACHINE in my-ranier1 my-ranier2
do
echo "MACHINE: ${MACHINE}"
for CAPABILITY in $(lssyscfg -r sys -F capabilities -m "${MACHINE}" | sed 's|,| |g' | sed 's|"||g')
do
echo "CAPABILITY: ${CAPABILITY}" | grep active_mem_expansion_capable
done
echo
done

The following shows:

MACHINE: my-ranier1
CAPABILITY: active_mem_expansion_capable
CAPABILITY: hardware_active_mem_expansion_capable
CAPABILITY: active_mem_mirroring_hypervisor_capable
CAPABILITY: cod_mem_capable
CAPABILITY: huge_page_mem_capable
CAPABILITY: persistent_mem_capable

MACHINE: my-ranier2
CAPABILITY: cod_mem_capable
CAPABILITY: huge_page_mem_capable
CAPABILITY: persistent_mem_capable

Then you should be all set to use nx-gzip on my-ranier1

Best wishes

2025-04-16

Helpful Tool – mtr

I was not aware of the mtr which Network diagnostic tool combining 'traceroute' and 'ping'.

You can quickly install on RHEL/Centos with sudo dnf install -y mtr

The output is super helpful to where you have drops:

 mtr --report bastide.org
Start: 2025-04-04T12:44:43-0400
HOST: nx-gzip-d557-bastion-0.x Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 10.20.176.3                0.0%    10    1.9   1.2   0.9   1.9   0.3
  2.|-- 172.16.32.4                0.0%    10    0.7   0.7   0.7   0.8   0.0
  3.|-- att-vc-srx-interconnect.p  0.0%    10   30.2  33.9  25.4  62.6  11.0
  4.|-- XX.5.16.XXX                0.0%    10   11.8  11.8  11.7  12.0   0.1
  5.|-- po97.prv-leaf6a.net.unifi  0.0%    10   62.5  63.2  62.5  67.5   1.5

2025-04-04

DNS Resolver Hangs with OpenVPN

Running multiple OpenVPN on the mac, sometimes my DNS hangs and I can’t get the VPNs. I use this hack to get around it.

❯ sudo networksetup -setdnsservers Wi-Fi "Empty"

2025-04-03

Kernel Stack Trace

Quick hack to find stack trace.

Look in proc find /proc -name stack

You can see the last stack for example… /proc/479260/stack

[<0>] hrtimer_nanosleep+0x89/0x120
[<0>] __x64_sys_nanosleep+0x96/0xd0
[<0>] do_syscall_64+0x5b/0x1a0
[<0>] entry_SYSCALL_64_after_hwframe+0x66/0xcb

It superb to figure out a real-time hang and pattern.

2025-02-04

vim versus plain vi: One Compelling Reason

My colleague, Michael Q, introduced me to a vim extension that left me saying… that’s awesome.

set cuc which enables Cursor Column, and when I use it with set number, it’s awesome to see correct indenting

The commands are:

Shift + :
set cuc and enter
Shift + :
set number and enter

`set cuc` which enables *Cursor Column*, and when I use it with `set number`, it's awesome to see correct indenting

Use set nocuc to disable

Good luck…

Post Script

Install vim with dnf install -y vim

Reference VimTrick: set cuc

2025-01-24

Updates to the Open Source Container images for Power now available in IBM Container Registry

The IBM Linux on Power team updated the open source container images list on their IBM Container Registry (ICR). You can find out more at https://community.ibm.com/community/user/powerdeveloper/blogs/priya-seth/2023/04/05/open-source-containers-for-power-in-icr

redis v7.4.1-bv podman pull icr.io/ppc64le-oss/redis-ppc64le:v7.4.1-bv Nov 21, 2024
mongodb 6.0.13-bv podman pull icr.io/ppc64le-oss/mongodb-ppc64le:6.0.13-bv Nov 21, 2024
rocketchat 6.11.1 MIT podman pull icr.io/ppc64le-oss/rocketchat-ppc64le:6.11.1 Nov 21, 202

The milvus 2.4.11 container is added to the list of OpenSource Containers:

podman pull icr.io/ppc64le-oss/milvus-ppc64le:v2.4.11

2024-11-25

🚀 Exciting News for Developers! .NET 9.0 Now Available on IBM Power with Red Hat Enterprise Linux and OpenShift 🖥️

My colleague, Paul Chapman, has a nice article on .NET 9.0 running on IBM Power.

https://community.ibm.com/community/user/powerdeveloper/blogs/paul-chapman/2024/11/21/dotnet9

Best wishes.

2024-11-22

Recommended: How oc-mirror version 2 enables disconnected installations in OpenShift 4.16

This is a recommended article on oc-mirror and getting started with a fundamental tool in OpenShift.

https://developers.redhat.com/articles/2024/10/14/how-oc-mirror-version-2-enables-disconnected-installations-openshift-416

This guide demonstrates the use of oc-mirror v2 to assist in populating a local Red Hat Quay registry that will be used for a disconnected installation, and includes the steps used to configure openshift-marketplace to use catalog sources that point to the local Red Hat Quay registry.

2024-10-28

Coming to Grips with Linux Pressure Stall Information

The Linux Pressure Stall Information, as part of the Control Group v2, provides an accurate accounting of a containers cpu, memory and io. The psi stats allow accurate and limited access to resources – no over-committing and no over-sizing.

However, it sometimes is difficult to see if the a container is being limited and could use more resources assigned.

This article is designed to help you diagnose and check your pods so you can get the best out of your workloads.

Check your workload

You can check the container in your Pod’s cpu.stat:

Find the containerId

[root@cpi-c7b2-bastion-0 ~]# oc get pod -n test test-pod -oyaml | grep -i containerID
  - containerID: cri-o://c050804396004e6b5d822541a58f299ea2b0e48936709175d6d57f3507cc6cea

Connect into the Pod.

[root@cpi-c7b2-bastion-0 ~]# oc rsh -n test test-pod
sh-4.4# find /sys -iname '*c050804396004e6b5d822541a58f299ea2b0e48936709175d6d57f3507cc6cea*'
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0d4b90d9_20f9_427d_9414_9964f32379dc.slice/crio-c050804396004e6b5d822541a58f299ea2b0e48936709175d6d57f3507cc6cea.scope
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0d4b90d9_20f9_427d_9414_9964f32379dc.slice/crio-conmon-c050804396004e6b5d822541a58f299ea2b0e48936709175d6d57f3507cc6cea.scope

Check the cpu.stat or io.stat or memory.stat.

/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0d4b90d9_20f9_427d_9414_9964f32379dc.slice/crio-conmon-c050804396004e6b5d822541a58f299ea2b0e48936709175d6d57f3507cc6cea.scope/cpu.stat
usage_usec 11628232854
user_usec 8689145332
system_usec 2939087521
core_sched.force_idle_usec 0
nr_periods 340955
nr_throttled 8
throttled_usec 8012
nr_bursts 0
burst_usec 0

We can see that the cpu is being throttled in nr_throttled and throttled_usec. This is really a minor impact for a container.

nr_throttled 8
throttled_usec 8012

If the container had a higher number of throttled events, you want to check the number of cpus or memory that your container is limited to, such as:

nr_throttled 103
throttled_usec 22929315

Check the container limits.

❯ NS=test
❯ POD=test-pod
❯ oc get -n ${NS} pod ${POD} -ojson | jq -r '.spec.containers[].resources.limits.cpu'
8

Patch your Pod or update your application to increase the cpus.

Checking real-time stats

You can check the real-time stats top for your container pressure. Log on to your host.

find /sys/fs/cgroup/kubepods.slice/ -iname cpu.pressure  | xargs -t -I {} cat {} | grep -v total=0
find /sys/fs/cgroup/kubepods.slice/ -iname memory.pressure  | xargs -t -I {} cat {} | grep -v total=0
find /sys/fs/cgroup/kubepods.slice/ -iname io.pressure  | xargs -t -I {} cat {} | grep -v total=0

This will show you all the pods that are under pressure.

for PRESSURE in $( find /sys/fs/cgroup/kubepods.slice/ -iname io.pressure)
do
    if [ ! -z "$(cat ${PRESSURE} | grep -v total=0)" ]
    then
        if [ ! -z "$(cat ${PRESSURE} | grep -v "avg10=0.00 avg60=0.00 avg300=0.00")" ]
        then
            echo ${PRESSURE}
        fi
    fi
done

❯ cat /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podde03ef16_000a_4198_9e04_ac96d0ea33c5.slice/crio-d200161683a680588c4de8346ff58d633201eae2ffd558c8d707c4836215645e.scope/io.pressure
some avg10=14.02 avg60=14.16 avg300=13.99 total=4121355556
full avg10=14.02 avg60=14.16 avg300=13.99 total=4121050788

In this case, I was able to go in and icnrease the total IO.

Tweak

You can tweak the cpu.pressure settings temporarily for a pod or system so the time used to evaluate is extended (this is the longest time possible).

The maximum window size is 10 seconds, and if you have kernel version less than 6.5 then the minimum window size is 500ms.

cat << EOF > /sys/fs/cgroup/cpu.pressure
some 10000000 10000000
full 10000000 10000000
EOF

Disabling psi in OpenShift

There are two methods to disable psi in OpenShift, the first is to set a kernel parameter, and the second is to switch from cgroupsv2 to cgroups.

Switch from cgroupsv2 to cgroups

You can switch from cgroupsv2 to cgroups – Configuring the Linux cgroup version on your nodes.

❯ oc patch nodes.config cluster --type merge -p '{"spec": {"cgroupMode": "v1"}}'

You’ll have to wait for each of the Nodes to restart.

Set the Kernel Parameter psi=0

In OpenShift, you can disable psi in using a MachineConfig.

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 99-worker-psi-disable
spec:
  kernelArguments:
  - psi=0

Check psi is enabled

You can check to see if it is enabled by checking one of the cpu.pressure, io.pressure or memory.pressure files. You’ll see “Operation not supported”.

sh-5.1# cat /sys/fs/cgroup/cpu.pressure
cat: /sys/fs/cgroup/cpu.pressure: Operation not supported

or

oc debug node/<node_name>
chroot /host
stat -c %T -f /sys/fs/cgroup
tmpfs

Summary

Linux PSI is pretty awesome. However, you should check your workload and verify it’s running correctly.

References

2024-09-13

Category: Application Development

References

Check your workload

Checking real-time stats

Tweak

Disabling psi in OpenShift

Switch from cgroupsv2 to cgroups

Set the Kernel Parameter psi=0

Check psi is enabled

Summary

References