FYI: google/go-containerregistry has a new release v0.19.2. This adds a new feature we care about:
crane mutate myimage --set-platform linux/arm64
This release also supports using podman’s authfile from the REGISTRY_AUTH_FILE
file.
FYI: google/go-containerregistry has a new release v0.19.2. This adds a new feature we care about:
crane mutate myimage --set-platform linux/arm64
This release also supports using podman’s authfile from the REGISTRY_AUTH_FILE
file.
I found a cool article on Cert Manager with IPI PowerVS
Simplify certificate management on OpenShift across multiple architectures
Chirag Kyal is a Software Engineer at Red Hat… has authored an article about deploying IPI PowerVS and Cert Manager on IBM Cloud.
Check out the article about efficient certificate management techniques on Red Hat OpenShift using the cert-manager Operator for OpenShift’s multi-architecture support.
The new information for the end of April is:
The IBM Linux on Power team released more images to their IBM Container Registry (ICR) here are the new ones:
milvus | v2.3.3 | docker pull icr.io/ppc64le-oss/milvus-ppc64le:v2.3.3 | April 2, 2024 |
rust | 1.66.1 | docker pull icr.io/ppc64le-oss/rust-ppc64le:1.66.1 | April 2, 2024 |
opensearch | 2.12.0 | docker pull icr.io/ppc64le-oss/opensearch-ppc64le:2.12.0 | April 16, 2024 |
Original post was at https://community.ibm.com/community/user/powerdeveloper/blogs/paul-bastide/2024/04/26/getting-started-with-a-sock-shop-a-sample-multi-ar?CommunityKey=daf9dca2-95e4-4b2c-8722-03cd2275ab63
I’ve developed the following script to help you get started deploying multiarchitecture applications and show elaborate on the techniques for controllin multiarch compute. This script uses the sock-shop application which is available at https://github.com/ocp-power-demos/sock-shop-demo . This series of instructions for sock-shop-demo requires kustomize
and following the readme.md in the repository to setup the username and password for mongodb.
You do not need to do every step that follows, please feel free to install/use what you’d like. I recommend the kustomize install with multi-no-ns
, and then playing with the features you find interesting. Note, multi-no-ns requires no namespace.
The layout of the application is described in this diagram:
This deployment shows the Exec errors and pod scheduling errors that are encountered when scheduling Intel only Pods on Power.
For these steps, you are going to clone the ocp-power-demos’s sock-shop-demo and then experiment to resolve errors so the application is up and running.
I’d recommend running this from a bastion.
git clone https://github.com/ocp-power-demos/sock-shop-demo
sock-shop-demo
folderkustomize
– this tool enable a ordered layout of the resources. You’ll also need oc
installed.curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | bash
Ref: https://kubectl.docs.kubernetes.io/installation/kustomize/binaries/
The reason kustomize
is used is due to the sort order feature in the binary.
manifests/overlays/single/env.secret
file with a username and password for mongodb. openssl rand -hex 10
is a good tip to generating a random password. You’ll need to copy this env.secret in each ‘overlays/` folder that is used in the demo.❯ kustomize build manifests/overlays/single | oc apply -f -
This create a full application within the OpenShift project.
To see the layout of the application you can see the third diagram of the layout (except these are only Intel images) https://github.com/ocp-power-demos/sock-shop-demo/blob/main/README.md#diagrams
oc get pods -owide
❯ oc get pods -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
carts-585dc6c878-wq6jg 0/1 Error 6 (2m56s ago) 6m21s 10.129.2.24 mac-01a7-worker-0 <none> <none>
carts-db-78f756b87c-r4pl9 1/1 Running 0 6m19s 10.131.0.32 rdr-mac-cust-el-tmwmg-worker-1-6g97b <none> <none>
catalogue-77d7c444bb-wnltt 0/1 CrashLoopBackOff 6 (8s ago) 6m17s 10.130.2.21 mac-01a7-worker-1 <none> <none>
catalogue-db-5bc97c6b98-v9rdp 1/1 Running 0 6m16s 10.131.0.33 rdr-mac-cust-el-tmwmg-worker-1-6g97b <none> <none>
front-end-648fdf6957-bjk9m 0/1 CrashLoopBackOff 5 (2m44s ago) 6m14s 10.129.2.25 mac-01a7-worker-0 <none> <none>
orders-5dbf8994df-whb9r 0/1 CrashLoopBackOff 5 (2m47s ago) 6m13s 10.130.2.22 mac-01a7-worker-1 <none> <none>
orders-db-7544dc7fd9-w9zh7 1/1 Running 0 6m11s 10.128.3.83 rdr-mac-cust-el-tmwmg-worker-2-5hbxg <none> <none>
payment-6cdff467b9-n2dql 0/1 Error 6 (2m53s ago) 6m10s 10.130.2.23 mac-01a7-worker-1 <none> <none>
queue-master-c9dcf8f87-c8drl 0/1 CrashLoopBackOff 5 (2m41s ago) 6m8s 10.129.2.26 mac-01a7-worker-0 <none> <none>
rabbitmq-54689956b9-rt5fb 2/2 Running 0 6m7s 10.131.0.34 rdr-mac-cust-el-tmwmg-worker-1-6g97b <none> <none>
session-db-7d4cc56465-dcx9f 1/1 Running 0 6m5s 10.130.2.24 mac-01a7-worker-1 <none> <none>
shipping-5ff5f44465-tbjv7 0/1 Error 6 (2m51s ago) 6m4s 10.130.2.25 mac-01a7-worker-1 <none> <none>
user-64dd65b5b7-49cbd 0/1 CrashLoopBackOff 5 (2m25s ago) 6m3s 10.129.2.27 mac-01a7-worker-0 <none> <none>
user-db-7f864c9f5f-jchf6 1/1 Running 0 6m1s 10.131.0.35 rdr-mac-cust-el-tmwmg-worker-1-6g97b <none> <none>
You might be lucky enough for the scheduler to assign these to Intel only nodes.
At this point if they are all Running with no restarts, yes it’s running.
❯ oc get routes
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
sock-shop sock-shop-test-user-4.apps.rdr-mac-cust-d.rdr-xyz.net front-end 8079 edge/Redirect None
It failed for me.
The purpose is to cordon the Power Nodes and delete the existing pod so you get the Pod running on the architecture you want. This is only recommended on a dev/test system and on the worker nodes.
oc get nodes -l kubernetes.io/arch=ppc64le | grep worker
oc adm cordon node/<worker>
❯ oc get pods -l name=front-end
NAME READY STATUS RESTARTS AGE
front-end-648fdf6957-bjk9m 0/1 CrashLoopBackOff 13 (26s ago) 42m
front-end
pods.oc delete pod/front-end-648fdf6957-bjk9m
The app should be running correctly at this point.
Demonstrate how to use node selector to put the workload on the right nodes.
These microservices use Deployments. We can modify the deployment to use NodeSelectors.
manifests/overlays/single/09-front-end-dep.yaml
or oc edit deployment/front-end
nodeSelector
field and add an architecture limitation using a Node
label:nodeSelector:
node.openshift.io/os_id: rhcos
kubernetes.io/arch: amd64
oc apply -f manifests/overlays/single/09-front-end-dep.yaml
❯ oc get pods -l name=front-end
NAME READY STATUS RESTARTS AGE
front-end-648fdf6957-bjk9m 0/1 CrashLoopBackOff 14 (2m49s ago) 50m
front-end-7bd476764-t974g 0/1 ContainerCreating 0 40s
front-end
pod on the power node.oc delete pod/front-end-648fdf6957-bjk9m
Note, you can run the following to run with nodeSelectors.
❯ kustomize build manifests/overlays/single-node-selector | oc delete -f -
❯ kustomize build manifests/overlays/single-node-selector | oc apply -f -
Are the pods running on the Intel node?
With the nodeSelector now started, you can uncordon the Power nodes. This is only recommended on a dev/test system and on the worker nodes.
oc get nodes -l kubernetes.io/arch=ppc64le | grep worker
oc adm uncordon node/<worker>
❯ oc get pods -l name=front-end
NAME READY STATUS RESTARTS AGE
front-end-6944957cd6-qmhhg 1/1 Running 0 19s
The application should be running. If not, please use:
❯ kustomize build manifests/overlays/single-node-selector | oc delete -f -
❯ kustomize build manifests/overlays/single-node-selector | oc apply -f -
The workload should be all on the intel side.
With many of these applications, there are architecture specific alternatives. You can run without NodeSelectors to get the workload scheduled where there is support.
To switch to Node selectors use across Power/Intel.
oc project sock-shop
❯ kustomize build manifests/overlays/multi-no-ns | oc apply -f -
❯ oc get pods -owide
We’re going to move one of the applications’ dependencies using rabbitmq
. The IBM team has created a port of Redis to ppc64le. link
image: kbudde/rabbitmq-exporter
on line 32. icr.io/ppc64le-oss/rabbitmq-exporter-ppc64le:1.0.0-RC19
kubernetes.io/arch: amd64
limitation on line 39kustomize build manifests/overlays/multi-no-ns | oc apply -f -
❯ oc get pod -l name=rabbitmq -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
rabbitmq-65c75db8db-9jqbd 2/2 Running 0 96s 10.130.2.31 mac-01a7-worker-1 <none> <none>
The pod should now start on the Power node.
You’ve taken advantage of the containers, and you can take advantage of other OpenSource container images https://community.ibm.com/community/user/powerdeveloper/blogs/priya-seth/2023/04/05/open-source-containers-for-power-in-icr
Taints and Tolerations provide a way to
oc get nodes -l kubernetes.io/arch=ppc64le | grep worker
oc adm taint nodes node1 kubernetes.io/arch=ppc64le:NoSchedule
Also note, the taints are flipped (intel is tained with power taint)
manifests/overlays/multi-taint-front-end/09-front-end-dep.yaml
oc apply -f manifests/overlays/multi-taint-front-end/09-front-end-dep.yaml
❯ oc get pods -o wide -l name=front-end
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
front-end-69c64bf86f-98nkc 0/1 Running 0 9s 10.128.3.99 rdr-mac-cust-el-tmwmg-worker-2-5hbxg <none> <none>
front-end-7f4f4844c8-x79zn 1/1 Running 0 103s 10.130.2.33 mac-01a7-worker-1 <none> <none>
You might have to give a few minutes before the workload shifts.
❯ oc get pods -o wide -l name=front-end
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
front-end-69c64bf86f-98nkc 1/1 Running 0 35s 10.128.3.99 rdr-mac-cust-el-tmwmg-worker-2-5hbxg <none> <none>
Ref: OpenShift 4.14: Understanding taints and tolerations
These are different techniques to help schedule/control workload placement and help you explore Multi-Arch Compute.
My colleague, Punith, and I have also posted two documents on further controlling workload placement:
Here are my notes for setting up the SIG’s nfs-provisioner. You should follow these directions to setup the nfs-provisioner kubernetes-sigs/nfs-subdir-external-provisioner.
a. Create the namespace
oc new-project nfs-provisioner
b. Annotate the namespace with elevated privileges so we can create NFS mounts
# oc label namespace/nfs-provisioner security.openshift.io/scc.podSecurityLabelSync=false --overwrite=true
namespace/nfs-provisioner labeled
# oc label namespace/nfs-provisioner pod-security.kubernetes.io/enforce=privileged --overwrite=true
namespace/nfs-provisioner labeled
# oc label namespace/nfs-provisioner pod-security.kubernetes.io/enforce-version=v1.24 --overwrite=true
namespace/nfs-provisioner labeled
# oc label namespace/nfs-provisioner pod- security.kubernetes.io/audit=privileged --overwrite=true
namespace/nfs-provisioner labeled
# oc label namespace/nfs-provisioner pod-security.kubernetes.io/warn=privileged --overwrite=true
namespace/nfs-provisioner labeled
# curl -O -L https://github.com/IBM/ocp4-power-workload-tools/manifests/storage/storage-class-nfs-template.yaml
oc adm policy add-scc-to-user hostmount-anyuid system:serviceaccount:nfs-provisioner:nfs-client-provisioner
# oc process -f storage-class-nfs-template.yaml -p NFS_PATH=/data -p NFS_SERVER=10.17.2.138 | oc apply -f –
deployment.apps/nfs-client-provisioner created
serviceaccount/nfs-client-provisioner created
clusterrole.rbac.authorization.k8s.io/nfs-client-provisioner-runner created
clusterrolebinding.rbac.authorization.k8s.io/run-nfs-client-provisioner created
role.rbac.authorization.k8s.io/leader-locking-nfs-client-provisioner created
rolebinding.rbac.authorization.k8s.io/leader-locking-nfs-client-provisioner created
storageclass.storage.k8s.io/nfs-client created
oc get pods
NAME READY STATUS RESTARTS AGE
nfs-client-provisioner-b8764c6bb-mjnq9 1/1 Running 0 36s
❯ oc get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
nfs-client k8s-sigs.io/nfs-subdir-external-provisioner Delete Immediate false 3m27s
If you see more than the nfs-client listed, you may have to change the defaults.
oc patch storageclass storageclass-name -p ‘{“metadata”: {“annotations”: {“storageclass.kubernetes.io/is-default-class”: “false”}}}’
Here are some updates for April 2024.
FYI: I was made aware of
kubernetes-sigs/kube-scheduler-simulator
and the release simulator/v0.2.0.That’s why we are developing a simulator for kube-scheduler — you can try out the behavior of the scheduler while checking which plugin made what decision for which Node.
https://github.com/kubernetes-sigs/kube-scheduler-simulator/tree/simulator/v0.2.0
The Linux on Power Team added three new Power supported containers.
https://community.ibm.com/community/user/powerdeveloper/blogs/priya-seth/2023/04/05/open-source-containers-for-power-in-icrcassandra 4.1.3 docker pull icr.io/ppc64le-oss/cassandra-ppc64le:4.1.3 April 2, 2024 milvus v2.3.3 docker pull icr.io/ppc64le-oss/milvus-ppc64le:v2.3.3 April 2, 2024 rust 1.66.1 docker pull icr.io/ppc64le-oss/rust-ppc64le:1.66.1 April 2, 2024 mongodb 5.0.26 April 9, 2024 docker pull icr.io/ppc64le-oss/mongodb-ppc64le:5.0.26 mongodb 6.0.13 April 9, 2024 docker pull icr.io/ppc64le-oss/mongodb-ppc64le:6.0.13 logstash 8.11.3 April 9, 2024 docker pull icr.io/ppc64le-oss/logstash-ppc64le:8.11.3
Added a new fix for imagestream set schedule
https://gist.github.com/prb112/838d8c2ae908b496f5d5480411a7d692
An article worth rekindling in our memories…
Optimal LPAR placement for a Red Hat OpenShift cluster within IBM PowerVM
Optimal logical partition (LPAR) placement can be important to improve the performance of workloads as this can favor efficient use of the memory and CPU resources on the system. However, for certain configuration and settings such as I/O devices allocation to the partition, amount of memory allocation, CPU entitlement to the partition, and so on we might not get a desired LPAR placement. In such situations, the technique described in this blog can enable you to place the LPAR in a desired optimal configuration.
https://community.ibm.com/community/user/powerdeveloper/blogs/mel-bakhshi/2022/08/11/openshift-lpar-placement-powervm
There is an updated list Red Hat products supporting IBM Power.
https://community.ibm.com/community/user/powerdeveloper/blogs/ashwini-sule/2024/04/05/red-hat-products-mar-2024
Enhancing container security with Aqua Trivy on IBM Power
… IBM Power development team found that Trivy is as effective as other open source scanners in detecting vulnerabilities. Not only does Trivy prove to be suitable for container security in IBM Power clients’ DevSecOps pipelines, but the scanning process is simple. IBM Power’s support for Aqua Trivy underscores its industry recognition for its efficacy as an open source scanner.
https://community.ibm.com/community/user/powerdeveloper/blogs/jenna-murillo/2024/04/08/enhanced-container-security-with-trivy-on-power
Podman 5.0 is released
https://blog.podman.io/2024/03/podman-5-0-has-been-released/
I presented on:
The Red Hat OpenShift Container Platform runs on IBM Power systems, offering a secure and reliable foundation for modernizing applications and running containerized workloads.
Multi-Arch Compute for OpenShift Container Platform lets you use a pair of compute architectures, such as ppc64le and amd64, within a single cluster. This exciting feature opens new possibilities for versatility and optimization for composite solutions that span multiple architectures.
Join Paul Bastide, IBM Senior Software Engineer, as he introduces the background behind Multi-Arch Compute and then gets you started setting up, configuring, and scheduling workloads. After, Paul will take you through a brief demonstration showing common problems and solutions for running multiple architectures in the same cluster.
Go here to see the download https://ibm.webcasts.com/starthere.jsp?ei=1660167&tp_key=ddb6b00dbd&_gl=11snjgp3_gaMjk3MzQzNDU1LjE3MTI4NTQ3NzA._ga_FYECCCS21D*MTcxMjg1NDc2OS4xLjAuMTcxMjg1NDc2OS4wLjAuMA..&_ga=2.141469425.2128302208.1712854770-297343455.1712854770
Shows a Microservices Application running on Red Hat OpenShift Control Plane on IBM Power Systems with an Intel Worke
Here are some great updates for the first half of April 2024.
Sizing and configuring an LPAR for AI workloads
Sebastian Lehrig has a great introduction into CPU/AI/NUMA on Power10.
https://community.ibm.com/community/user/powerdeveloper/blogs/sebastian-lehrig/2024/03/26/sizing-for-ai
FYI: a new article is published – Improving the User Experience for Multi-Architecture Compute on IBM Power
More and more IBM® Power® clients are modernizing securely with lower risk and faster time to value with cloud-native microservices on Red Hat® OpenShift® running alongside their existing banking and industry applications on AIX, IBM i, and Linux. With the availability of Red Hat OpenShift 4.15 on March 19th, Red Hat and IBM introduced a long-awaited innovation called Multi-Architecture Compute that enables clients to mix Power and x86 worker nodes in a single Red Hat OpenShift cluster. With the release of Red Hat OpenShift 4.15, clients can now run the control plane for a Multi-Architecture Compute cluster natively on Power.
Some tips for setting up a Multi-Arch Compute Cluster
Setting up a multi-arch compute cluster manually, not using automation, you’ll want to follow this process:
ICMP/TCP/UDP flowing in both directions
a. Change any MTU between the networks
oc patch Network.operator.openshift.io cluster --type=merge --patch \
'{"spec": { "migration": { "mtu": { "network": { "from": 1400, "to": 1350 } , "machine": { "to" : 9100} } } } }'
b. Limit CSI drivers to a single Arch
oc annotate --kubeconfig /root/.kube/config ns openshift-cluster-csi-drivers \
scheduler.alpha.kubernetes.io/node-selector=kubernetes.io/arch=amd64
c. Disable offloading (I do this in the ignition)
d. Move the imagepruner jobs to the architecture that makes the most sense
oc patch imagepruner/cluster -p '{ "spec" : {"nodeSelector": {"kubernetes.io/arch" : "amd64"}}}' --type merge
e. Move the ingress operator pods to the arch that makes the most sense. If you want the ingress pods to be on Intel then patch the clsuter.
oc edit IngressController default -n openshift-ingress-operator
Change ingresscontroller.spec.nodePlacement.nodeSelector
to use the kubernetes.io/arch: amd64
to move the workfload to Intel only.
f. use routing via host
oc patch network.operator/cluster --type merge -p \
'{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"gatewayConfig":{"routingViaHost":true}}}}}'
Wait until the MCP is finished updating and has the latest MTU
g. Download the igntion file and host on the local network via http.
{
"ignition": {
"version": "3.4.0",
"config": {
"merge": [
{
"source": "http://${ignition_ip}:8080/ignition/worker.ign"
}
]
}
},
"storage": {
"files": [
{
"group": {},
"path": "/etc/hostname",
"user": {},
"contents": {
"source": "data:text/plain;base64,${name}",
"verification": {}
},
"mode": 420
},
{
"group": {},
"path": "/etc/NetworkManager/dispatcher.d/20-ethtool",
"user": {},
"contents": {
"source": "data:text/plain;base64,aWYgWyAiJDEiID0gImVudjIiIF0gJiYgWyAiJDIiID0gInVwIiBdCnRoZW4KICBlY2hvICJUdXJuaW5nIG9mZiB0eC1jaGVja3N1bW1pbmciCiAgL3NiaW4vZXRodG9vbCAtLW9mZmxvYWQgZW52MiB0eC1jaGVja3N1bW1pbmcgb2ZmCmVsc2UgCiAgZWNobyAibm90IHJ1bm5pbmcgdHgtY2hlY2tzdW1taW5nIG9mZiIKZmkKaWYgc3lzdGVtY3RsIGlzLWZhaWxlZCBOZXR3b3JrTWFuYWdlci13YWl0LW9ubGluZQp0aGVuCnN5c3RlbWN0bCByZXN0YXJ0IE5ldHdvcmtNYW5hZ2VyLXdhaXQtb25saW5lCmZpCg==",
"verification": {}
},
"mode": 420
}
]
}
}
${name}
is base64 encoded.
a. Configure shared storage using the nfs provisioner and limit to running from the architecture that is hosting the NFS shared volumes.
b. Approve the CSRs for the workers. Do this carefully as it’s possible to lose the count as it may include Machine updates/csrs.
The IBM team Bringing OpenShift Container Platform Multiple-Architecture Compute to IBM Power with sme.up posted an update on one of our OpenShift on Power customers who used Multi-Architecture Compute.