The scheduler-plugins has a new release v0.30.6. This feature is used in concert with the Secondary Scheduler Operator
- kube-scheduler registry.k8s.io/scheduler-plugins/kube-scheduler:v0.30.6
This one aligns the k8s version – v1.30.6
The scheduler-plugins has a new release v0.30.6. This feature is used in concert with the Secondary Scheduler Operator
This one aligns the k8s version – v1.30.6
IPI with FIPS mode creates certificates that are FIPS compliant and makes sure the Nodes/Operators are using the proper cryptographic profiles.
FIPS
Mode and a RHEL9 equivalent stream.fips-mode-setup --check
Note, you must reboot after enabling fips or this binary will not function.
oc
# curl -O https://mirror.openshift.com/pub/openshift-v4/ppc64le/clients/ocp-dev-preview/4.18.0-ec.2/openshift-client-linux-ppc64le-rhel9.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 44 32.4M 44 14.3M 0 0 14.1M 0 0:00:02 0:00:01 0:00:01 1100 32.4M 100 32.4M 0 0 17.0M 0 0:00:01 0:00:01 --:--:-- 17.0M
# tar xvf openshift-client-linux-ppc64le-rhel9.tar.gz
oc
kubectl
README.md
You can optionally move the oc
and kubectl
files to /usr/local/bin/
ccoctl
# curl -O https://mirror.openshift.com/pub/openshift-v4/ppc64le/clients/ocp-dev-preview/4.18.0-ec.2/ccoctl-linux-rhel9.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 44 32.4M 44 14.3M 0 0 14.1M 0 0:00:02 0:00:01 0:00:01 1100 32.4M 100 32.4M 0 0 17.0M 0 0:00:01 0:00:01 --:--:-- 17.0M
# tar xvf ccoctl-linux-rhel9.tar.gz ccoctl
ccoctl
# chmod 755 ccoctl
Copy over your pull-secret.txt
Get the Credentials Request pull spec from the release image https://mirror.openshift.com/pub/openshift-v4/ppc64le/clients/ocp-dev-preview/4.18.0-ec.2/release.txt
Pull From: quay.io/openshift-release-dev/ocp-release@sha256:6507d5a101294c670a283f5b56c5595fb1212bd6946b2c3fee01de2ef661625f
# mkdir -p credreqs
# oc adm release extract --cloud=powervs --credentials-requests quay.io/openshift-release-dev/ocp-release@sha256:6507d5a101294c670a283f5b56c5595fb1212bd6946b2c3fee01de2ef661625f --to=./credreqs -a pull-secret.txt
...
Extracted release payload created at 2024-10-02T21:38:57Z
# ls credreqs/
0000_26_cloud-controller-manager-operator_15_credentialsrequest-powervs.yaml
0000_30_cluster-api_01_credentials-request.yaml
0000_30_machine-api-operator_00_credentials-request.yaml
0000_50_cluster-image-registry-operator_01-registry-credentials-request-powervs.yaml
0000_50_cluster-ingress-operator_00-ingress-credentials-request.yaml
0000_50_cluster-storage-operator_03_credentials_request_powervs.yaml
# export IBMCLOUD_API_KEY=<your ibmcloud apikey>
# ./ccoctl ibmcloud create-service-id --credentials-requests-dir ./credreqs --name fips-svc --resource-group-name ocp-dev-resource-group
2024/11/01 08:22:12 Saved credentials configuration to: /root/install/t/manifests/openshift-cloud-controller-manager-ibm-cloud-credentials-credentials.yaml
2024/11/01 08:22:12 Saved credentials configuration to: /root/install/t/manifests/openshift-machine-api-powervs-credentials-credentials.yaml
2024/11/01 08:22:12 Saved credentials configuration to: /root/install/t/manifests/openshift-image-registry-installer-cloud-credentials-credentials.yaml
2024/11/01 08:22:12 Saved credentials configuration to: /root/install/t/manifests/openshift-ingress-operator-cloud-credentials-credentials.yaml
2024/11/01 08:22:12 Saved credentials configuration to: /root/install/t/manifests/openshift-cluster-csi-drivers-ibm-powervs-cloud-credentials-credentials.yaml
curl -O https://mirror.openshift.com/pub/openshift-v4/ppc64le/clients/ocp-dev-preview/4.18.0-ec.2/openshift-install-rhel9-ppc64le.tar.gz
Note, with a FIPS host, you’ll want to use rhel9
as it supports FIPS https://mirror.openshift.com/pub/openshift-v4/ppc64le/clients/ocp-dev-preview/4.18.0-ec.2/openshift-client-linux-ppc64le-rhel9.tar.gz
Unarchive openshift-install-rhel9-ppc64le.tar.gz
Create the install-config.yaml using openshift-install-fips create install-config
per https://developer.ibm.com/tutorials/awb-deploy-ocp-on-power-vs-ipi/
Edit install-config.yaml
and add a new line at the end fips: true
[root@fips-ocp-7219-bastion-0 t]# mkdir -p 20241031c
[root@fips-ocp-7219-bastion-0 t]# cp install-config.yaml-old 20241031c/install-config.yaml
openshift-install-fips create manifests
# openshift-install-fips create manifests
WARNING Release Image Architecture not detected. Release Image Architecture is unknown
INFO Consuming Install Config from target directory
INFO Adding clusters...
INFO Manifests created in: cluster-api, manifests and openshift
# cp credreqs/manifests/openshift-*yaml 20241031c/openshift/
# ls openshift/
99_feature-gate.yaml 99_openshift-machineconfig_99-master-ssh.yaml
99_kubeadmin-password-secret.yaml 99_openshift-machineconfig_99-worker-fips.yaml
99_openshift-cluster-api_master-machines-0.yaml 99_openshift-machineconfig_99-worker-multipath.yaml
99_openshift-cluster-api_master-machines-1.yaml 99_openshift-machineconfig_99-worker-ssh.yaml
99_openshift-cluster-api_master-machines-2.yaml openshift-cloud-controller-manager-ibm-cloud-credentials-credentials.yaml
99_openshift-cluster-api_master-user-data-secret.yaml openshift-cluster-csi-drivers-ibm-powervs-cloud-credentials-credentials.yaml
99_openshift-cluster-api_worker-machineset-0.yaml openshift-config-secret-pull-secret.yaml
99_openshift-cluster-api_worker-user-data-secret.yaml openshift-image-registry-installer-cloud-credentials-credentials.yaml
99_openshift-machine-api_master-control-plane-machine-set.yaml openshift-ingress-operator-cloud-credentials-credentials.yaml
99_openshift-machineconfig_99-master-fips.yaml openshift-install-manifests.yaml
99_openshift-machineconfig_99-master-multipath.yaml openshift-machine-api-powervs-credentials-credentials.yaml
BASE_DOMAIN=powervs-openshift-ipi.cis.ibm.net RELEASE_ARCHITECTURE="ppc64le" openshift-install-fips create cluster
INFO Creating infrastructure resources...
INFO Started local control plane with envtest
INFO Stored kubeconfig for envtest in: /root/install/t/20241031c/.clusterapi_output/envtest.kubeconfig
INFO Running process: Cluster API with args [-v=2 --diagnostics-address=0 --health-addr=127.0.0.1:45201 --webhook-port=40159 --webhook-cert-dir=/tmp/envtest-serving-certs-1721884268 --kubeconfig=/root/install/t/20241031c/.clusterapi_output/envtest.kubeconfig]
INFO Running process: ibmcloud infrastructure provider with args [--provider-id-fmt=v2 --v=5 --health-addr=127.0.0.1:37207 --webhook-port=35963 --webhook-cert-dir=/tmp/envtest-serving-certs-3500602992 --kubeconfig=/root/install/t/20241031c/.clusterapi_output/envtest.kubeconfig]
INFO Creating infra manifests...
INFO Created manifest *v1.Namespace, namespace= name=openshift-cluster-api-guests
INFO Created manifest *v1beta1.Cluster, namespace=openshift-cluster-api-guests name=fips-fd4f6
INFO Created manifest *v1beta2.IBMPowerVSCluster, namespace=openshift-cluster-api-guests name=fips-fd4f6
INFO Created manifest *v1beta2.IBMPowerVSImage, namespace=openshift-cluster-api-guests name=rhcos-fips-fd4f6
INFO Done creating infra manifests
INFO Creating kubeconfig entry for capi cluster fips-fd4f6
INFO Waiting up to 30m0s (until 9:06AM EDT) for network infrastructure to become ready...
INFO Network infrastructure is ready
INFO Created manifest *v1beta2.IBMPowerVSMachine, namespace=openshift-cluster-api-guests name=fips-fd4f6-bootstrap
INFO Created manifest *v1beta2.IBMPowerVSMachine, namespace=openshift-cluster-api-guests name=fips-fd4f6-master-0
INFO Created manifest *v1beta2.IBMPowerVSMachine, namespace=openshift-cluster-api-guests name=fips-fd4f6-master-1
INFO Created manifest *v1beta2.IBMPowerVSMachine, namespace=openshift-cluster-api-guests name=fips-fd4f6-master-2
INFO Created manifest *v1beta1.Machine, namespace=openshift-cluster-api-guests name=fips-fd4f6-bootstrap
INFO Created manifest *v1beta1.Machine, namespace=openshift-cluster-api-guests name=fips-fd4f6-master-0
INFO Created manifest *v1beta1.Machine, namespace=openshift-cluster-api-guests name=fips-fd4f6-master-1
INFO Created manifest *v1beta1.Machine, namespace=openshift-cluster-api-guests name=fips-fd4f6-master-2
INFO Created manifest *v1.Secret, namespace=openshift-cluster-api-guests name=fips-fd4f6-bootstrap
INFO Created manifest *v1.Secret, namespace=openshift-cluster-api-guests name=fips-fd4f6-master
INFO Waiting up to 15m0s (until 9:02AM EDT) for machines [fips-fd4f6-bootstrap fips-fd4f6-master-0 fips-fd4f6-master-1 fips-fd4f6-master-2] to provision...
INFO Control-plane machines are ready
INFO Cluster API resources have been created. Waiting for cluster to become ready...
INFO Consuming Cluster API Manifests from target directory
INFO Consuming Cluster API Machine Manifests from target directory
INFO Waiting up to 20m0s (until 9:21AM EDT) for the Kubernetes API at https://api.fips.powervs-openshift-ipi.cis.ibm.net:6443...
INFO API v1.31.1 up
INFO Waiting up to 45m0s (until 9:47AM EDT) for bootstrapping to complete...
INFO Destroying the bootstrap resources...
INFO Waiting up to 5m0s for bootstrap machine deletion openshift-cluster-api-guests/fips-fd4f6-bootstrap...
INFO Shutting down local Cluster API controllers...
INFO Stopped controller: Cluster API
INFO Stopped controller: ibmcloud infrastructure provider
INFO Shutting down local Cluster API control plane...
INFO Local Cluster API system has completed operations
INFO no post-destroy requirements for the powervs provider
INFO Finished destroying bootstrap resources
INFO Waiting up to 40m0s (until 10:16AM EDT) for the cluster at https://api.fips.powervs-openshift-ipi.cis.ibm.net:6443 to initialize...
If you have any doubts, you can start a second terminal session and use the kubeconfig to verify access:
# oc --kubeconfig=auth/kubeconfig get nodes
NAME STATUS ROLES AGE VERSION
fips-fd4f6-master-0 Ready control-plane,master 41m v1.31.1
fips-fd4f6-master-1 Ready control-plane,master 41m v1.31.1
fips-fd4f6-master-2 Ready control-plane,master 41m v1.31.1
fips-fd4f6-worker-srwf2 Ready worker 7m37s v1.31.1
fips-fd4f6-worker-tc28p Ready worker 7m13s v1.31.1
fips-fd4f6-worker-vrlrq Ready worker 7m12s v1.31.1
You can also check oc --kubeconfig=auth/kubeconfig get co
19. When it’s complete you can login and use your fips enabled cluster
The Linux Pressure Stall Information, as part of the Control Group v2, provides an accurate accounting of a containers cpu, memory and io. The psi
stats allow accurate and limited access to resources – no over-committing and no over-sizing.
However, it sometimes is difficult to see if the a container is being limited and could use more resources assigned.
This article is designed to help you diagnose and check your pods so you can get the best out of your workloads.
You can check the container in your Pod’s cpu.stat:
[root@cpi-c7b2-bastion-0 ~]# oc get pod -n test test-pod -oyaml | grep -i containerID
- containerID: cri-o://c050804396004e6b5d822541a58f299ea2b0e48936709175d6d57f3507cc6cea
[root@cpi-c7b2-bastion-0 ~]# oc rsh -n test test-pod
sh-4.4# find /sys -iname '*c050804396004e6b5d822541a58f299ea2b0e48936709175d6d57f3507cc6cea*'
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0d4b90d9_20f9_427d_9414_9964f32379dc.slice/crio-c050804396004e6b5d822541a58f299ea2b0e48936709175d6d57f3507cc6cea.scope
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0d4b90d9_20f9_427d_9414_9964f32379dc.slice/crio-conmon-c050804396004e6b5d822541a58f299ea2b0e48936709175d6d57f3507cc6cea.scope
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0d4b90d9_20f9_427d_9414_9964f32379dc.slice/crio-conmon-c050804396004e6b5d822541a58f299ea2b0e48936709175d6d57f3507cc6cea.scope/cpu.stat
usage_usec 11628232854
user_usec 8689145332
system_usec 2939087521
core_sched.force_idle_usec 0
nr_periods 340955
nr_throttled 8
throttled_usec 8012
nr_bursts 0
burst_usec 0
nr_throttled
and throttled_usec
. This is really a minor impact for a container.nr_throttled 8
throttled_usec 8012
If the container had a higher number of throttled events, you want to check the number of cpus or memory that your container is limited to, such as:
nr_throttled 103
throttled_usec 22929315
❯ NS=test
❯ POD=test-pod
❯ oc get -n ${NS} pod ${POD} -ojson | jq -r '.spec.containers[].resources.limits.cpu'
8
You can check the real-time stats top
for your container pressure. Log on to your host.
find /sys/fs/cgroup/kubepods.slice/ -iname cpu.pressure | xargs -t -I {} cat {} | grep -v total=0
find /sys/fs/cgroup/kubepods.slice/ -iname memory.pressure | xargs -t -I {} cat {} | grep -v total=0
find /sys/fs/cgroup/kubepods.slice/ -iname io.pressure | xargs -t -I {} cat {} | grep -v total=0
This will show you all the pods that are under pressure.
for PRESSURE in $( find /sys/fs/cgroup/kubepods.slice/ -iname io.pressure)
do
if [ ! -z "$(cat ${PRESSURE} | grep -v total=0)" ]
then
if [ ! -z "$(cat ${PRESSURE} | grep -v "avg10=0.00 avg60=0.00 avg300=0.00")" ]
then
echo ${PRESSURE}
fi
fi
done
❯ cat /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podde03ef16_000a_4198_9e04_ac96d0ea33c5.slice/crio-d200161683a680588c4de8346ff58d633201eae2ffd558c8d707c4836215645e.scope/io.pressure
some avg10=14.02 avg60=14.16 avg300=13.99 total=4121355556
full avg10=14.02 avg60=14.16 avg300=13.99 total=4121050788
In this case, I was able to go in and icnrease the total IO.
You can tweak the cpu.pressure settings temporarily for a pod or system so the time used to evaluate is extended (this is the longest time possible).
The maximum window size is 10 seconds, and if you have kernel version less than 6.5 then the minimum window size is 500ms.
cat << EOF > /sys/fs/cgroup/cpu.pressure
some 10000000 10000000
full 10000000 10000000
EOF
There are two methods to disable psi
in OpenShift, the first is to set a kernel parameter, and the second is to switch from cgroupsv2 to cgroups.
You can switch from cgroupsv2 to cgroups – Configuring the Linux cgroup version on your nodes.
❯ oc patch nodes.config cluster --type merge -p '{"spec": {"cgroupMode": "v1"}}'
You’ll have to wait for each of the Nodes to restart.
In OpenShift, you can disable psi
in using a MachineConfig
.
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: 99-worker-psi-disable
spec:
kernelArguments:
- psi=0
You can check to see if it is enabled by checking one of the cpu.pressure, io.pressure or memory.pressure files. You’ll see “Operation not supported”.
sh-5.1# cat /sys/fs/cgroup/cpu.pressure
cat: /sys/fs/cgroup/cpu.pressure: Operation not supported
or
oc debug node/<node_name>
chroot /host
stat -c %T -f /sys/fs/cgroup
tmpfs
Linux PSI is pretty awesome. However, you should check your workload and verify it’s running correctly.
kernel/sched/psi.c
Red Hat OpenShift 4.16 is generally available for upgrades and new installations, and as of today is announced. It is based on Kubernetes 1.29 with the CRI-O 1.29 runtime, RHEL CoreOS 9.4. You can read the release notes at https://docs.openshift.com/container-platform/4.16/release_notes/ocp-4-16-release-notes.html
Some cool features you can use are:
– oc adm upgrade status command, which decouples status information from the existing oc adm upgrade command and provides specific information regarding a cluster update, including the status of the control plane and worker node updates. https://docs.openshift.com/container-platform/4.16/updating/updating_a_cluster/updating-cluster-cli.html#update-upgrading-oc-adm-upgrade-status_updating-cluster-cli
– Tech Preview and Generally Available Table – https://docs.openshift.com/container-platform/4.16/release_notes/ocp-4-16-release-notes.html#ocp-4-16-technology-preview-tables_release-notes
FYI: google/go-containerregistry has a new release v0.19.2. This adds a new feature we care about:
crane mutate myimage --set-platform linux/arm64
This release also supports using podman’s authfile from the REGISTRY_AUTH_FILE
file.
I found a cool article on Cert Manager with IPI PowerVS
Simplify certificate management on OpenShift across multiple architectures
Chirag Kyal is a Software Engineer at Red Hat… has authored an article about deploying IPI PowerVS and Cert Manager on IBM Cloud.
Check out the article about efficient certificate management techniques on Red Hat OpenShift using the cert-manager Operator for OpenShift’s multi-architecture support.
The new information for the end of April is:
The IBM Linux on Power team released more images to their IBM Container Registry (ICR) here are the new ones:
milvus | v2.3.3 | docker pull icr.io/ppc64le-oss/milvus-ppc64le:v2.3.3 | April 2, 2024 |
rust | 1.66.1 | docker pull icr.io/ppc64le-oss/rust-ppc64le:1.66.1 | April 2, 2024 |
opensearch | 2.12.0 | docker pull icr.io/ppc64le-oss/opensearch-ppc64le:2.12.0 | April 16, 2024 |
Original post was at https://community.ibm.com/community/user/powerdeveloper/blogs/paul-bastide/2024/04/26/getting-started-with-a-sock-shop-a-sample-multi-ar?CommunityKey=daf9dca2-95e4-4b2c-8722-03cd2275ab63
I’ve developed the following script to help you get started deploying multiarchitecture applications and show elaborate on the techniques for controllin multiarch compute. This script uses the sock-shop application which is available at https://github.com/ocp-power-demos/sock-shop-demo . This series of instructions for sock-shop-demo requires kustomize
and following the readme.md in the repository to setup the username and password for mongodb.
You do not need to do every step that follows, please feel free to install/use what you’d like. I recommend the kustomize install with multi-no-ns
, and then playing with the features you find interesting. Note, multi-no-ns requires no namespace.
The layout of the application is described in this diagram:
This deployment shows the Exec errors and pod scheduling errors that are encountered when scheduling Intel only Pods on Power.
For these steps, you are going to clone the ocp-power-demos’s sock-shop-demo and then experiment to resolve errors so the application is up and running.
I’d recommend running this from a bastion.
git clone https://github.com/ocp-power-demos/sock-shop-demo
sock-shop-demo
folderkustomize
– this tool enable a ordered layout of the resources. You’ll also need oc
installed.curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | bash
Ref: https://kubectl.docs.kubernetes.io/installation/kustomize/binaries/
The reason kustomize
is used is due to the sort order feature in the binary.
manifests/overlays/single/env.secret
file with a username and password for mongodb. openssl rand -hex 10
is a good tip to generating a random password. You’ll need to copy this env.secret in each ‘overlays/` folder that is used in the demo.❯ kustomize build manifests/overlays/single | oc apply -f -
This create a full application within the OpenShift project.
To see the layout of the application you can see the third diagram of the layout (except these are only Intel images) https://github.com/ocp-power-demos/sock-shop-demo/blob/main/README.md#diagrams
oc get pods -owide
❯ oc get pods -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
carts-585dc6c878-wq6jg 0/1 Error 6 (2m56s ago) 6m21s 10.129.2.24 mac-01a7-worker-0 <none> <none>
carts-db-78f756b87c-r4pl9 1/1 Running 0 6m19s 10.131.0.32 rdr-mac-cust-el-tmwmg-worker-1-6g97b <none> <none>
catalogue-77d7c444bb-wnltt 0/1 CrashLoopBackOff 6 (8s ago) 6m17s 10.130.2.21 mac-01a7-worker-1 <none> <none>
catalogue-db-5bc97c6b98-v9rdp 1/1 Running 0 6m16s 10.131.0.33 rdr-mac-cust-el-tmwmg-worker-1-6g97b <none> <none>
front-end-648fdf6957-bjk9m 0/1 CrashLoopBackOff 5 (2m44s ago) 6m14s 10.129.2.25 mac-01a7-worker-0 <none> <none>
orders-5dbf8994df-whb9r 0/1 CrashLoopBackOff 5 (2m47s ago) 6m13s 10.130.2.22 mac-01a7-worker-1 <none> <none>
orders-db-7544dc7fd9-w9zh7 1/1 Running 0 6m11s 10.128.3.83 rdr-mac-cust-el-tmwmg-worker-2-5hbxg <none> <none>
payment-6cdff467b9-n2dql 0/1 Error 6 (2m53s ago) 6m10s 10.130.2.23 mac-01a7-worker-1 <none> <none>
queue-master-c9dcf8f87-c8drl 0/1 CrashLoopBackOff 5 (2m41s ago) 6m8s 10.129.2.26 mac-01a7-worker-0 <none> <none>
rabbitmq-54689956b9-rt5fb 2/2 Running 0 6m7s 10.131.0.34 rdr-mac-cust-el-tmwmg-worker-1-6g97b <none> <none>
session-db-7d4cc56465-dcx9f 1/1 Running 0 6m5s 10.130.2.24 mac-01a7-worker-1 <none> <none>
shipping-5ff5f44465-tbjv7 0/1 Error 6 (2m51s ago) 6m4s 10.130.2.25 mac-01a7-worker-1 <none> <none>
user-64dd65b5b7-49cbd 0/1 CrashLoopBackOff 5 (2m25s ago) 6m3s 10.129.2.27 mac-01a7-worker-0 <none> <none>
user-db-7f864c9f5f-jchf6 1/1 Running 0 6m1s 10.131.0.35 rdr-mac-cust-el-tmwmg-worker-1-6g97b <none> <none>
You might be lucky enough for the scheduler to assign these to Intel only nodes.
At this point if they are all Running with no restarts, yes it’s running.
❯ oc get routes
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
sock-shop sock-shop-test-user-4.apps.rdr-mac-cust-d.rdr-xyz.net front-end 8079 edge/Redirect None
It failed for me.
The purpose is to cordon the Power Nodes and delete the existing pod so you get the Pod running on the architecture you want. This is only recommended on a dev/test system and on the worker nodes.
oc get nodes -l kubernetes.io/arch=ppc64le | grep worker
oc adm cordon node/<worker>
❯ oc get pods -l name=front-end
NAME READY STATUS RESTARTS AGE
front-end-648fdf6957-bjk9m 0/1 CrashLoopBackOff 13 (26s ago) 42m
front-end
pods.oc delete pod/front-end-648fdf6957-bjk9m
The app should be running correctly at this point.
Demonstrate how to use node selector to put the workload on the right nodes.
These microservices use Deployments. We can modify the deployment to use NodeSelectors.
manifests/overlays/single/09-front-end-dep.yaml
or oc edit deployment/front-end
nodeSelector
field and add an architecture limitation using a Node
label:nodeSelector:
node.openshift.io/os_id: rhcos
kubernetes.io/arch: amd64
oc apply -f manifests/overlays/single/09-front-end-dep.yaml
❯ oc get pods -l name=front-end
NAME READY STATUS RESTARTS AGE
front-end-648fdf6957-bjk9m 0/1 CrashLoopBackOff 14 (2m49s ago) 50m
front-end-7bd476764-t974g 0/1 ContainerCreating 0 40s
front-end
pod on the power node.oc delete pod/front-end-648fdf6957-bjk9m
Note, you can run the following to run with nodeSelectors.
❯ kustomize build manifests/overlays/single-node-selector | oc delete -f -
❯ kustomize build manifests/overlays/single-node-selector | oc apply -f -
Are the pods running on the Intel node?
With the nodeSelector now started, you can uncordon the Power nodes. This is only recommended on a dev/test system and on the worker nodes.
oc get nodes -l kubernetes.io/arch=ppc64le | grep worker
oc adm uncordon node/<worker>
❯ oc get pods -l name=front-end
NAME READY STATUS RESTARTS AGE
front-end-6944957cd6-qmhhg 1/1 Running 0 19s
The application should be running. If not, please use:
❯ kustomize build manifests/overlays/single-node-selector | oc delete -f -
❯ kustomize build manifests/overlays/single-node-selector | oc apply -f -
The workload should be all on the intel side.
With many of these applications, there are architecture specific alternatives. You can run without NodeSelectors to get the workload scheduled where there is support.
To switch to Node selectors use across Power/Intel.
oc project sock-shop
❯ kustomize build manifests/overlays/multi-no-ns | oc apply -f -
❯ oc get pods -owide
We’re going to move one of the applications’ dependencies using rabbitmq
. The IBM team has created a port of Redis to ppc64le. link
image: kbudde/rabbitmq-exporter
on line 32. icr.io/ppc64le-oss/rabbitmq-exporter-ppc64le:1.0.0-RC19
kubernetes.io/arch: amd64
limitation on line 39kustomize build manifests/overlays/multi-no-ns | oc apply -f -
❯ oc get pod -l name=rabbitmq -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
rabbitmq-65c75db8db-9jqbd 2/2 Running 0 96s 10.130.2.31 mac-01a7-worker-1 <none> <none>
The pod should now start on the Power node.
You’ve taken advantage of the containers, and you can take advantage of other OpenSource container images https://community.ibm.com/community/user/powerdeveloper/blogs/priya-seth/2023/04/05/open-source-containers-for-power-in-icr
Taints and Tolerations provide a way to
oc get nodes -l kubernetes.io/arch=ppc64le | grep worker
oc adm taint nodes node1 kubernetes.io/arch=ppc64le:NoSchedule
Also note, the taints are flipped (intel is tained with power taint)
manifests/overlays/multi-taint-front-end/09-front-end-dep.yaml
oc apply -f manifests/overlays/multi-taint-front-end/09-front-end-dep.yaml
❯ oc get pods -o wide -l name=front-end
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
front-end-69c64bf86f-98nkc 0/1 Running 0 9s 10.128.3.99 rdr-mac-cust-el-tmwmg-worker-2-5hbxg <none> <none>
front-end-7f4f4844c8-x79zn 1/1 Running 0 103s 10.130.2.33 mac-01a7-worker-1 <none> <none>
You might have to give a few minutes before the workload shifts.
❯ oc get pods -o wide -l name=front-end
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
front-end-69c64bf86f-98nkc 1/1 Running 0 35s 10.128.3.99 rdr-mac-cust-el-tmwmg-worker-2-5hbxg <none> <none>
Ref: OpenShift 4.14: Understanding taints and tolerations
These are different techniques to help schedule/control workload placement and help you explore Multi-Arch Compute.
My colleague, Punith, and I have also posted two documents on further controlling workload placement:
Here are my notes for setting up the SIG’s nfs-provisioner. You should follow these directions to setup the nfs-provisioner kubernetes-sigs/nfs-subdir-external-provisioner.
a. Create the namespace
oc new-project nfs-provisioner
b. Annotate the namespace with elevated privileges so we can create NFS mounts
# oc label namespace/nfs-provisioner security.openshift.io/scc.podSecurityLabelSync=false --overwrite=true
namespace/nfs-provisioner labeled
# oc label namespace/nfs-provisioner pod-security.kubernetes.io/enforce=privileged --overwrite=true
namespace/nfs-provisioner labeled
# oc label namespace/nfs-provisioner pod-security.kubernetes.io/enforce-version=v1.24 --overwrite=true
namespace/nfs-provisioner labeled
# oc label namespace/nfs-provisioner pod- security.kubernetes.io/audit=privileged --overwrite=true
namespace/nfs-provisioner labeled
# oc label namespace/nfs-provisioner pod-security.kubernetes.io/warn=privileged --overwrite=true
namespace/nfs-provisioner labeled
# curl -O -L https://github.com/IBM/ocp4-power-workload-tools/manifests/storage/storage-class-nfs-template.yaml
oc adm policy add-scc-to-user hostmount-anyuid system:serviceaccount:nfs-provisioner:nfs-client-provisioner
# oc process -f storage-class-nfs-template.yaml -p NFS_PATH=/data -p NFS_SERVER=10.17.2.138 | oc apply -f –
deployment.apps/nfs-client-provisioner created
serviceaccount/nfs-client-provisioner created
clusterrole.rbac.authorization.k8s.io/nfs-client-provisioner-runner created
clusterrolebinding.rbac.authorization.k8s.io/run-nfs-client-provisioner created
role.rbac.authorization.k8s.io/leader-locking-nfs-client-provisioner created
rolebinding.rbac.authorization.k8s.io/leader-locking-nfs-client-provisioner created
storageclass.storage.k8s.io/nfs-client created
oc get pods
NAME READY STATUS RESTARTS AGE
nfs-client-provisioner-b8764c6bb-mjnq9 1/1 Running 0 36s
❯ oc get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
nfs-client k8s-sigs.io/nfs-subdir-external-provisioner Delete Immediate false 3m27s
If you see more than the nfs-client listed, you may have to change the defaults.
oc patch storageclass storageclass-name -p ‘{“metadata”: {“annotations”: {“storageclass.kubernetes.io/is-default-class”: “false”}}}’
Here are some updates for April 2024.
FYI: I was made aware of
kubernetes-sigs/kube-scheduler-simulator
and the release simulator/v0.2.0.That’s why we are developing a simulator for kube-scheduler — you can try out the behavior of the scheduler while checking which plugin made what decision for which Node.
https://github.com/kubernetes-sigs/kube-scheduler-simulator/tree/simulator/v0.2.0
The Linux on Power Team added three new Power supported containers.
https://community.ibm.com/community/user/powerdeveloper/blogs/priya-seth/2023/04/05/open-source-containers-for-power-in-icrcassandra 4.1.3 docker pull icr.io/ppc64le-oss/cassandra-ppc64le:4.1.3 April 2, 2024 milvus v2.3.3 docker pull icr.io/ppc64le-oss/milvus-ppc64le:v2.3.3 April 2, 2024 rust 1.66.1 docker pull icr.io/ppc64le-oss/rust-ppc64le:1.66.1 April 2, 2024 mongodb 5.0.26 April 9, 2024 docker pull icr.io/ppc64le-oss/mongodb-ppc64le:5.0.26 mongodb 6.0.13 April 9, 2024 docker pull icr.io/ppc64le-oss/mongodb-ppc64le:6.0.13 logstash 8.11.3 April 9, 2024 docker pull icr.io/ppc64le-oss/logstash-ppc64le:8.11.3
Added a new fix for imagestream set schedule
https://gist.github.com/prb112/838d8c2ae908b496f5d5480411a7d692
An article worth rekindling in our memories…
Optimal LPAR placement for a Red Hat OpenShift cluster within IBM PowerVM
Optimal logical partition (LPAR) placement can be important to improve the performance of workloads as this can favor efficient use of the memory and CPU resources on the system. However, for certain configuration and settings such as I/O devices allocation to the partition, amount of memory allocation, CPU entitlement to the partition, and so on we might not get a desired LPAR placement. In such situations, the technique described in this blog can enable you to place the LPAR in a desired optimal configuration.
https://community.ibm.com/community/user/powerdeveloper/blogs/mel-bakhshi/2022/08/11/openshift-lpar-placement-powervm
There is an updated list Red Hat products supporting IBM Power.
https://community.ibm.com/community/user/powerdeveloper/blogs/ashwini-sule/2024/04/05/red-hat-products-mar-2024
Enhancing container security with Aqua Trivy on IBM Power
… IBM Power development team found that Trivy is as effective as other open source scanners in detecting vulnerabilities. Not only does Trivy prove to be suitable for container security in IBM Power clients’ DevSecOps pipelines, but the scanning process is simple. IBM Power’s support for Aqua Trivy underscores its industry recognition for its efficacy as an open source scanner.
https://community.ibm.com/community/user/powerdeveloper/blogs/jenna-murillo/2024/04/08/enhanced-container-security-with-trivy-on-power
Podman 5.0 is released
https://blog.podman.io/2024/03/podman-5-0-has-been-released/