Tag: openshift

  • 🚀 In-Place Pod Resize in Kubernetes: What You Need to Know

    DRAFT This is not a complete article. I haven’t yet fully tested and vetted the steps I built. I will come back and hopefully update.

    In Kubernetes v1.33, In-Place Pod Resize has entered Beta. This feature allows you to resize the CPU and memory resources of containers in a running Pod without needing to restart them. This feature is fairly nice for Power customers who scale their systems vertically. You would need to also restart the kubelet.

    One no longer has to change the resource requests or limits of a pod in Kubernetes and restart the Pod. This restart was disruptive for long-running workloads.

    With in-place pod resize, autoscaling workloads, improving stateful applications is a real win.

    1. Enable the InPlacePodVerticalScaling featuregate in a kind config called kind-cluster-config.yaml
    kind: Cluster
    apiVersion: kind.x-k8s.io/v1alpha4
    featureGates:
      InPlacePodVerticalScaling: true
    nodes:
    - role: control-plane
      kubeadmConfigPatches:
      - |
        kind: ClusterConfiguration
        apiServer:
            extraArgs:
              v: "1"
        scheduler:
            extraArgs:
              v: "1"
        controllerManager:
            extraArgs:
              v: "1"
      - |
        kind: InitConfiguration
        nodeRegistration:
          kubeletExtraArgs:
            v: "1"
    - role: worker
      kubeadmConfigPatches:
      - |
        kind: JoinConfiguration
        nodeRegistration:
          kubeletExtraArgs:
            v: "1"
    
    1. Download kind
    mkdir -p dev-cache
    GOBIN=$(PWD)/dev-cache/ go install sigs.k8s.io/kind@v0.29.0
    
    1. Start the kind cluster
    KIND_EXPERIMENTAL_PROVIDER=podman dev-cache/kind create cluster \
    		--image quay.io/powercloud/kind-node:v1.33.1 \
    		--name test \
    		--config kind-cluster-config.yaml\
    		--wait 5m
    
    1. Create a namespace
    apiVersion: v1
    kind: Namespace
    metadata:
      labels:
        kubernetes.io/metadata.name: resize-test
        pod-security.kubernetes.io/audit: restricted
        pod-security.kubernetes.io/audit-version: v1.24
        pod-security.kubernetes.io/enforce: restricted
        pod-security.kubernetes.io/warn: restricted
        pod-security.kubernetes.io/warn-version: v1.24
      name: resize-test
    
    1. Create a Pod
    apiVersion: v1
    kind: Pod
    metadata:
      name: resize-test
    spec:
      containers:
      - name: resize-test
        image: registry.access.redhat.com/ubi9/ubi
        resizePolicy:
        - resourceName: cpu
          restartPolicy: NotRequired
        - resourceName: memory
          restartPolicy: NotRequired
        resources:
          limits:
            memory: "200Mi"
            cpu: "1"
          requests:
            memory: "200Mi"
            cpu: "1"
    
    1. Edit kubectl edit pod/test -n resize-test
    2. Check kubectl describe pod/test -n resize-test
    3. Check oc rsh pod/test and run lscpu to see the size changed

    You’ve seen how this feature functions with Kubernetes and can resize your Pod without a restart.

    References

    1. Kubernetes v1.33: In-Place Pod Resize Graduated to Beta
    2. Resize CPU and Memory Resources assigned to Containers
  • Playing with Container Lifecycle Hooks and ContainerStopSignals

    DRAFT This is not a complete article. I haven’t yet fully tested and vetted the steps I built. I will come back and hopefully update.

    Kubernetes orchestrates Pods across multiple nodes. When a Pod lands on a node, the Kubelet admits the Pod and its containers, and manages the lifecycle of the containers. When the Pod is terminated, the kubelet sends a SIGTERM signal to the running processes. In Kubernetes Enhancement – Container Stop Signals #4960, custom Pod stopSignal is allowed: spec.containers[].lifecycle.stopSignal and you can use one of sixty-five additional stop signals to stop the Pod. While behind a feature gate, you can see supportedStopSignalsLinux.

    For example, a user may use SIGQUIT signal to stop a container in the Pod. To do so with kind,

    1. Enable the ContainerStopSignals featuregate in a kind config called kind-cluster-config.yaml
    kind: Cluster
    apiVersion: kind.x-k8s.io/v1alpha4
    featureGates:
      ContainerStopSignals: true
    nodes:
    - role: control-plane
      kubeadmConfigPatches:
      - |
        kind: ClusterConfiguration
        apiServer:
            extraArgs:
              v: "1"
        scheduler:
            extraArgs:
              v: "1"
        controllerManager:
            extraArgs:
              v: "1"
      - |
        kind: InitConfiguration
        nodeRegistration:
          kubeletExtraArgs:
            v: "1"
    - role: worker
      kubeadmConfigPatches:
      - |
        kind: JoinConfiguration
        nodeRegistration:
          kubeletExtraArgs:
            v: "1"
    
    1. Download kind
    mkdir -p dev-cache
    GOBIN=$(PWD)/dev-cache/ go install sigs.k8s.io/kind@v0.29.0
    
    1. Start the kind cluster
    KIND_EXPERIMENTAL_PROVIDER=podman dev-cache/kind create cluster \
    		--image quay.io/powercloud/kind-node:v1.33.1 \
    		--name test \
    		--config kind-cluster-config.yaml\
    		--wait 5m
    
    1. Create a namespace
    apiVersion: v1
    kind: Namespace
    metadata:
      labels:
        kubernetes.io/metadata.name: lifecycle-test
        pod-security.kubernetes.io/audit: restricted
        pod-security.kubernetes.io/audit-version: v1.24
        pod-security.kubernetes.io/enforce: restricted
        pod-security.kubernetes.io/warn: restricted
        pod-security.kubernetes.io/warn-version: v1.24
      name: lifecycle-test
    
    1. Create a Pod
    apiVersion: v1
    kind: Pod
    metadata:
      name: test
      namespace: lifecycle-test
    spec:
      containers:
      - name: test
        command: ["/bin/sh", "-c"]
        args:
          - function cleanup() { echo "CALLED SIGQUIT"; };
            trap cleanup SIGQUIT;
            sleep infinity
        image: registry.access.redhat.com/ubi9/ubi
        lifecycle:
          stopSignal: SIGQUIT
    
    1. Check kubectl describe pod/test -n lifecycle-test

    You’ve seen how this feature functions with Kubernetes and can take advantage of ContainerStopSignals in your environment.

    References

    1. Tracker: Kubernetes Enhancement – Container Stop Signals #4960 issue 30051
    2. KEP-4960: Container Stop Signals
    3. Kubernetes Documentation: Container Lifecycle Hooks
    4. An Introductory Guide to Managing the Kubernetes Pods Lifecycle
    5. Stop Signals
  • Great Job Team: Next-generation DataStage is now supported on IBM Power (ppc64le) with 5.2.0

    The IBM Team announced support for DataStage on IBM Power.

    IBM Cloud Pak for Data now supports the DataStage service on IBM Power servers. This means that you can run your data integration and extract, transform, and load (ETL) workloads directly on IBM Power, just like you already do on x86. With this update, it is easier than ever to use your existing Power infrastructure for modern data and AI projects.

    With the release of IBM DataStage 5.2.0, the DataStage service is now officially supported on IBM Power (ppc64le). This enables clients to run enterprise-grade ETL and data integration workloads on the Power platform, offering flexibility, performance, and consistency across architectures.

    See https://www.ibm.com/docs/en/software-hub/5.2.x?topic=requirements-ppc64le-hardware and https://community.ibm.com/community/user/blogs/yussuf-shaikh/2025/07/15/datastage-5-2-0-is-now-supported-on-ibm-power

  • Using procMount in your Kubernetes Pod

    Recently, I ran across Kubernetes Enhancement Proposal (KEP) 4265 where the authors update the Pod.spec.procMount capability to manage /proc visibility in a Pod’s security context. With this KEP moving to on-by default in v1.29.0, Unmasked disables masking and allows all paths in /proc (not just read-only).

    What this means is the Default procMount prevents containers from accessing sensitive kernel data or interacting with host-level processes. With this enhancement, you can run unprivileged containers inside a container (a container-in-a-container), build container images within a Pod, and use buildah in a Pod.

    The authors said it best in the KEP:

    The /proc filesystem is a virtual interface to kernel data structures. By default, Kubernetes instructs container runtimes to mask or restrict access to certain paths within /proc to prevent accidental or malicious exposure of host information. But this becomes problematic when users want to:

    Here is an example of creating a Pod:

    1. create the project
    oc new-project proc-mount-example
    
    1. Create the Pod
    cat << EOF | oc apply -f -
    apiVersion: v1
    kind: Pod
    metadata:
      name: nested-container-builder
      namespace: proc-mount-example
    spec:
      securityContext:
        runAsUser: 0
      containers:
      - name: builder
        image: registry.access.redhat.com/ubi9/ubi
        securityContext:
          privileged: true
          procMount: Unmasked
        command: ["/bin/sh"]
        args: ["-c", "sleep 3600"]
    EOF
    
    1. Switch to terminal and install podman
    oc rsh nested-container-builder
    dnf install -y podman
    
    1. Change the Shell (so you know when the parent is in focus…)
    export PS1="parent-container# "
    podman run --name abcd --rm -it registry.access.redhat.com/ubi9/ubi sh
    
    1. Run a privileged command again
    parent-container# podman run --name abcd --rm -it registry.access.redhat.com/ubi9/ubi sh
    sh-5.1# dnf install -y podman
    
    1. Now Run another in the nested one, you’ll see a failure in the /dev/net/tun.
    sh-5.1# podman run --name abcd --rm -it registry.access.redhat.com/ubi9/ubi sh
    Trying to pull registry.access.redhat.com/ubi9/ubi:latest...
    Getting image source signatures
    Checking if image destination supports signatures
    Copying blob ea2f7ff2baa2 done   | 
    Copying config 4da9fa8b5a done   | 
    Writing manifest to image destination
    Storing signatures
    ERRO[0018] Preparing container d402a22ebe452597a83b3795639f86e333c1dbb142703737d6d705c6a6f445c7: setting up Pasta: pasta failed with exit code 1:
                    Failed to open() /dev/net/tun: No such file or directory
                                                                            Failed to set up tap device in namespace 
    Error: mounting storage for container d402a22ebe452597a83b3795639f86e333c1dbb142703737d6d705c6a6f445c7: creating overlay mount to /var/lib/containers/storage/overlay/ab589890d52b88e51f1f945b55d07ac465de1cefd2411d8fab33b4d2769c4404/merged, mount_data="lowerdir=/var/lib/containers/storage/overlay/l/K6CXJGRTW32MPWEIMAH4IGCNZ5,upperdir=/var/lib/containers/storage/overlay/ab589890d52b88e51f1f945b55d07ac465de1cefd2411d8fab33b4d2769c4404/diff,workdir=/var/lib/containers/storage/overlay/ab589890d52b88e51f1f945b55d07ac465de1cefd2411d8fab33b4d2769c4404/work,nodev,volatile": using mount program /usr/bin/fuse-overlayfs: unknown argument ignored: lazytime
    fuse: device not found, try 'modprobe fuse' first
    fuse-overlayfs: cannot mount: No such file or directory
    : exit status 1
    

    It has the default access:

    • Default: Maintains the current behavior—masking sensitive /proc paths. If procMount is not specified, it defaults to Default, ensuring backward compatibility and preserving security for most workloads.
    • Unmasked: Bypasses the default masking, giving the container full access to /proc.

    Allowing unmasked access to /proc is a privileged operation. A container with root access and an unmasked /proc could potentially interact with the host system in dangerous ways. This powerful feature should be carefully used.

    Good luck.

    References

  • Outrigger: Rethinking Kubernetes Scheduling for a Smarter Future

    At DevConf.CZ 2025, a standout session from Alessandro Di Stefano and Prashanth Sundararaman introduced the Outrigger project, a forward-thinking initiative aimed at transforming Kubernetes scheduling into a dynamic, collaborative ecosystem. Building on the success of the Multiarch Tuning Operator for OpenShift, Outrigger leverages Kubernetes’ scheduling gates to go beyond traditional multi-architecture scheduling.

    👉 Watch the full session here:

    Excellent work by that team.

  • Using nx-gzip in your Red Hat OpenShift Container Platform on IBM Power to accelerate GZip performance

    Cross post from https://community.ibm.com/community/user/blogs/paul-bastide/2025/06/09/using-nx-gzip-in-your-red-hat-openshift-container

    The Power10 processor features an on-chip accelerator that is called the nest accelerator unit (NX unit). The coprocessor features that are available on the Power10 processor are similar to the features of the Power9 processor. These coprocessors provide specialized functions, such as the Industry-standard Gzip compression and decompressionRandom number generation and AES and Secure Hash Algorithm (SHA) cryptography.

    Block diagram of the NX unit

    This article outlines how to use nx-gzip in a non-privileged container in Red Hat OpenShift Container Platform on IBM Power. You must have deployed a cluster with workers with a processor compatibility of IBM Power 10 or higher. The Active Memory Expansion feature must be licensed.

    Build the power-gzip selftest binary

    The test binary is used to show the feature is working and you can use the selftest and sample code to integrate in your environment.

    1. Login to the PowerVM instance running Red Hat Enterprise Linux 9
    2. Install required build binaries
    dnf install make git gcc zlib-devel vim util-linux-2.37.4-11.el9.ppc64le -y
    
    1. Setup the Clone repository
    git clone https://github.com/libnxz/power-gzip
    cd power-gzip/
    
    1. Run the tests
    ./configure 
    cd selftests
    make
    
    1. Find the created test files
    # ls g*test -al
    -rwxr-xr-x. 1 root root 74992 Jun  9 08:24 gunz_test
    -rwxr-xr-x. 1 root root 74888 Jun  9 08:24 gzfht_test
    

    You are ready to test it.

    Setup the NX-GZip test deployment

    Download the examples repository and setup kustomization, and configure cri-o so you can deploy and use /dev/crypto/nx-gzip in a container.

    1. Install Kustomization tool for the deployment
    curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh"  | bash
    sudo mv kustomize /usr/local/bin
    kustomize -h
    
    1. Clone the ocp4-power-workload-tools repository
    git clone https://github.com/IBM/ocp4-power-workload-tools
    cd ocp4-power-workload-tools
    
    1. Configure the worker nodes to use /dev/crypto/nx-gzip as an allowed_device.
    oc apply -f ocp4-power-workload-tools/manifests/nx-gzip/99-worker-crio-nx-gzip.yaml
    
    1. Export kubeconfig using export KUBECONFIG=~/.kube/config
    2. Setup the nx-gzip test Pod as below
    cd manifests/nx-gzip
    kustomize build . | oc apply -f - 
    
    1. Resulting running pod as below
    # oc get pod -n nx-gzip-demo
    NAME               READY   STATUS    RESTARTS   AGE
    nx-gzip-ds-2mlmh   1/1     Running   0          3s
    

    You are ready to test nx-gzip.

    To test with Privileged mode, you can use nx-gzip-privileged.

    Copy the Test artifact into the running Pod and Run the Test Artifact

    1. Copy the above created executable files to the running pod
    # oc cp gzfht_test nx-gzip-ds-2mlmh:/nx-test/
    
    1. Access the pod shell and confirm the Model name is Power10 or higher.
    # oc rsh nx-gzip-ds-2mlmh
    sh-5.1# lscpu | grep Model
    Model name:                           POWER10 (architected), altivec supported
    Model:                                2.0 (pvr 0080 0200)
    
    1. Create a test file for testing
    sh-5.1# dd if=/dev/random of=/nx-test/test bs=1M count=1
    1+0 records in
    1+0 records out
    1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00431494 s, 243 MB/s
    sh-5.1#
    
    
    1. Run the tests in pod
    sh-5.1# /nx-test/gzfht_test /nx-test/test
    file /nx-test/test read, 1048576 bytes
    compressed 1048576 to 1105994 bytes total, crc32 checksum = a094fbab
    sh-5.1# echo $?
    0
    

    If it shows as compressed and the return code is 0 and as above then its considered as PASS.

    You’ve seen how the nx-gzip works in Pod. You can also combine with the Node Feature Discovery to label each Node Resource with cpu-coprocessor.nx_gzip=true

    Thank you for your time and good luck.

    Reference

    1. IBM Power10 Scale Out Servers Technical Overview S1012, S1014, S1022s, S1022 and S1024
    2. Exploitation of In-Core Acceleration of POWER Processors for AIX
    3. POWER NX zlib compliant library
    4. Db2: Hardware accelerated backup and log file compression
  • Getting the ibmvfc logs from the impacted clusters

    If you are using the IBM Virtual Fibre Channel adapter with your OpenShift on Power installation, you can use these steps to get the log details.

    Here are the steps to get the ibmvfc from the nodes which are failing:

    Grabbing the ibmvfc logs

    ibmvfc is the driver for the virtual fibre channel adapters.

    To setup ibmvfc logging:

    1. Login as a cluster-admin
    # export KUBECONFIG=/root/openstack-upi/auth/kubeconfig
    # oc get MachineConfigPool -o=jsonpath='{range.items[*]}{.metadata.name} {"\t"} {.status.nodeInfo.kubeletVersion}{"\n"}{end}'
    master
    worker
    
    1. For each of these listed MachineConfigPools, let’s create 99-<mcp-name>-vfc.yaml. These systems will reboot.
    # cat << EOF > 99-worker-vfc.yaml
    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      labels:
        machineconfiguration.openshift.io/role: "worker"
      name: 99-worker-vfc
    spec:
      kernelArguments:
        - 'scsi_mod.scsi_logging_level=4096'
        - 'ibmvfc.debug=1'
        - 'ibmvfc.log_level=3'
    EOF
    
    # cat << EOF > 99-master-vfc.yaml
    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      labels:
        machineconfiguration.openshift.io/role: "master"
      name: 99-master-vfc
    spec:
      kernelArguments:
        - 'scsi_mod.scsi_logging_level=4096'
        - 'ibmvfc.debug=1'
        - 'ibmvfc.log_level=3'
    EOF
    
    1. Let’s apply the yamls, one at a time:
    # oc apply -f 99-worker-vfc.yaml
    machineconfig.machineconfiguration.openshift.io/99-worker-vfc created
    
    1. Wait for the MachineConfigPool to come back up, such as worker:
    # oc wait mcp/worker --for condition=Ready --timeout=30m
    
    1. Verify each Machine Config Pool is done updating:

    The following shows the worker pool is updating:

    # oc get mcp worker
    NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
    worker   rendered-worker-b93fdaee39cd7d38a53382d3c259c8ae   False     True       True       2              1                   1                     1                      8d
    

    The following shows the worker pool is Ready:

    # oc get mcp worker
    NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
    worker   rendered-worker-b93fdaee39cd7d38a53382d3c259c8ae   True     False       False       2              2                   0                     2                      8d
    
    1. Spot check the updates…

    a. List the nodes oc get nodes b. Connect to one of the nodes oc debug node/worker-0 c. Change context to /host chroot /host d. verify kernel argument contain the three values we set.

    # rpm-ostree kargs
    rw $ignition_firstboot  ostree=/ostree/boot.1/rhcos/d7d848ba24dcacb1aba663e9868d4bd131482d9b7fecfa33197f558c53ae5208/0 ignition.platform.id=powervs root=UUID=06207aa5-3386-4044-bcb6-750e509d7cf0 rw rootflags=prjquota boot=UUID=6c67b96e-4e01-4e01-b8e5-ffeb4041bee2 systemd.unified_cgroup_hierarchy=1 cgroup_no_v1="all" psi=0 scsi_mod.scsi_logging_level=4096 ibmvfc.debug=1 ibmvfc.log_level=3 rd.multipath=default root=/dev/disk/by-label/dm-mpath-root
    
    1. Wait for the error to occur, get the console logs and the journalctl --dmesg output from the node.

    You’ll end up with a bunch of messages like:

    [    2.333257] ibmvfc 30000004: Partner initialization complete
    [    2.333308] ibmvfc 30000004: Sent NPIV login
    [    2.333336] ibmvfc: Entering ibmvfc_alloc_mem
    [    2.333340] ibmvfc: Entering ibmvfc_alloc_queue
    [    2.333343] ibmvfc: Entering ibmvfc_init_event_pool
    [    2.333402] ibmvfc: Leaving ibmvfc_alloc_mem
    [    2.333439] ibmvfc: Entering ibmvfc_init_crq
    [    2.333443] ibmvfc: Entering ibmvfc_alloc_queue
    [    2.333446] ibmvfc: Entering ibmvfc_init_event_pool
    [    2.333482] ibmvfc: Leaving ibmvfc_init_event_pool
    [    2.333743] ibmvfc: Leaving ibmvfc_init_crq
    

    Once we’ve grabbed this level of detail, we can delete the MachineConfig and it’ll reboot and reset the kernel arguments.

    And you can share the logs with support.

    Please only use this under guidance.

    Reference

    https://www.ibm.com/docs/en/linux-on-systems?topic=commands-scsi-logging-level

  • OpenShift… if you need a firewall

    If your security posture requires a firewall, you can add it to your OpenShift cluster using the following:

    1. Create a butane configuration
    cat << EOF > 98-nftables-worker.bu
    variant: openshift
    version: 4.16.0
    metadata:
      name: 98-nftables-worker
      labels:
        machineconfiguration.openshift.io/role: worker
    systemd:
      units:
        - name: "nftables.service"
          enabled: true
          contents: |
            [Unit]
            Description=Netfilter Tables
            Documentation=man:nft(8)
            Wants=network-pre.target
            Before=network-pre.target
            [Service]
            Type=oneshot
            ProtectSystem=full
            ProtectHome=true
            ExecStart=/sbin/nft -f /etc/sysconfig/nftables.conf
            ExecReload=/sbin/nft -f /etc/sysconfig/nftables.conf
            ExecStop=/sbin/nft 'add table inet custom_table; delete table inet custom_table'
            RemainAfterExit=yes
            [Install]
            WantedBy=multi-user.target
    storage:
      files:
      - path: /etc/sysconfig/nftables.conf
        mode: 0600
        overwrite: true
        contents:
          inline: |
            table inet custom_table
            delete table inet custom_table
            table inet custom_table {
                chain input {
                    type filter hook input priority 0; policy accept;
                    ip saddr 1.1.1.1/24 drop
                }
            }
    EOF
    
    1. Download butane
    curl -o butane https://github.com/coreos/butane/releases/download/v0.23.0/butane-ppc64le-unknown-linux-gnu -L
    
    1. Execute the butane

    chmod +x butane; ./butane 98-nftables-worker.bu -o 98-nftables-worker.yaml

    1. Run the nftables-worker.yaml butane
    oc apply -f 98-nftables-worker.yaml
    

    You can verify the workers drop the traffic.

    Reference

  • Cool Feature… NodeDisruptionPolicies

    I missed this feature in 4.17…. until I had to use it NodeDisruptionPolicies. If you are copying files over, you can avoid a MachineConfigPool reboot for files and services that depend on them. you can see more details Using node disruption policies to minimize disruption from machine config changes

    apiVersion: operator.openshift.io/v1
    kind: MachineConfiguration
    metadata:
      name: cluster
    spec:
      logLevel: Normal
      managementState: Managed
      operatorLogLevel: Normal
    status:
      nodeDisruptionPolicyStatus:
        clusterPolicies:
          files:
          - actions:
            - type: None
            path: /etc/mco/internal-registry-pull-secret.json
    

    Net… you can avoid a reboot when copying a file over/replacing a file and restarting a related service (already running).

    FYI I ran across it with relations to nftables.service https://access.redhat.com/articles/7090422

  • Red Hat OpenShift Container Platform on IBM Power Systems: Exploring Red Hat’s Multi-Arch Tuning Operator

    The Red Hat Multi-Arch Tuning Operator optimizes workload placement within multi-architecture compute clusters. Pods run on the compute architecture for which the containers declare support. Where Operators, Deployments, ReplicaSets, Jobs, CronJob, Pods don’t declare a nodeAffinity, in most cases, the Pods that are generate are updated with the node affinity so it lands on the supported (declared) CPU Architecture.

    For version 1.1.0, the Red Hat Multi-Arch Team, @Prashanth684@aleskandro@AnnaZivkovic and IBM Power Systems team @pkenchap have worked together to give cluster administrators better control and flexibility. The feature adds a plugins field in ClusterPodPlacementConfig and have build a first plugin called nodeAffinityScoring.

    Per the docs, the nodeAffinityScoring plugin adds weights and influence to the scheduler with this process:

    1. Analyzing the Pod’s containers for the supported architectures
    2. Generate the Scheduling predicates for nodeAffinity, e.g., 75 weight on ppc64le
    3. Filter out nodes that do not meet the Pod requirements, using the Predicates
    4. Prioritizes the remaining nodes based on the architecture scores defined in the nodeAffinityScoring.platforms field.

    To take advantages of this feature, use the following to asymmetrically load the Power nodes with work.

    apiVersion: multiarch.openshift.io/v1beta1
    kind: ClusterPodPlacementConfig
    metadata:
      name: cluster
    spec:
      logVerbosityLevel: Normal
      namespaceSelector:
        matchExpressions:
          - key: multiarch.openshift.io/exclude-pod-placement
            operator: Exists
      plugins:
        nodeAffinityScoring:
          enabled: true
          platforms:
            - architecture: ppc64le
              weight: 100
            - architecture: amd64
              weight: 50
    

    Best wishes, and looking forward to hearing how you use the Multi-Arch Tuning Operator on IBM Power with Multi-Arch Compute.

    References

    1. [RHOCP][TE] Multi-arch Tuning Operator: Cluster-wide architecture preferred/weighted affinity
    2. OpenShift 4.18 Docs: Chapter 4. Configuring multi-architecture compute machines on an OpenShift cluster
    3. OpenShift 4.18 Docs: 4.11. Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator
    4. Enhancement: Introducing the namespace-scoped PodPlacementConfig