Multi-Arch Compute Node Selector

Originally posted to Node Selector https://community.ibm.com/community/user/powerdeveloper/blogs/paul-bastide/2024/01/09/multi-arch-compute-node-selector?CommunityKey=daf9dca2-95e4-4b2c-8722-03cd2275ab63

The OpenShift Container Platform Multi-Arch Compute feature supports the pair of processor (ISA) architectures – ppc64le and amd64 in a cluster. With these pairs, there are various permutations when scheduling Pods. Fortunately, the platform has controls on where the work is scheduled in the cluster. One of these controls is called the node selector. This article outlines how to go about using Node Selectors at different levels – Pod, Project/Namespace, Cluster.

Pod Level

Per OpenShift 4.14: Placing pods on specific nodes using node selectors, a node selector is a map of key/value pairs to determine where the work is scheduled. The Pod nodeSelector values must be the same as the labels of a Node to be eligible for scheduling. If you need more advanced boolean logic, you may use affinity and antiaffinity rules. See Kubernetes: Affinity and anti-affinity

Consider the Pod definition for test, the nodeSelector has x: y and is matched with a Node which is labeled with .metadata.labels

apiVersion: v1
kind: Pod
metadata:
  name: test
spec:
  containers:
    - name: test
      image: "ocp-power.xyz/test:v0.0.1"
  nodeSelector:
    x: y

You can use node selectors on pods and labels on nodes to control where the pod is scheduled. With node selectors, OpenShift Container Platform schedules the pods on nodes that contain matching labels. To direct a Pod to a Power node, you could use the kubernetes.io/arch: ppc64le label.

apiVersion: v1
kind: Pod
metadata:
  name: test
spec:
  containers:
    - name: test
      image: "ocp-power.xyz/test:v0.0.1"
  nodeSelector:
    kubernetes.io/arch: ppc64le

You can see where the Pod is scheduled using oc get pods -owide.

❯ oc get pods -owide
NAME READY   STATUS    RESTARTS   AGE   IP            NODE              NOMINATED NODE   READINESS GATES
test 1/1     Running   0          24d   10.130.2.9    mac-acca-worker-1 <none>           <none>

You can confirm the architecture for each node oc get nodes mac-acca-worker-1 -owide. You’ll then see the uname is marked with ppc64le

❯ oc get nodes mac-acca-worker-1 -owide
NAME                STATUS   ROLES    AGE   VERSION           INTERNAL-IP      EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                  CONTAINER-RUNTIME
mac-acca-worker-1   Ready    worker   25d   v1.28.3+20a5764   192.168.200.11   <none>        Red Hat Enterprise Linux CoreOS 414.92.....   5.14.0-284.41.1.el9_2.ppc64le   cri-o://1.28.2-2.rhaos4.14.gite7be4e1.el9

This approach applies to high-level Kubernetes abstractions such as ReplicaSets, Deployments or DaemonSets.

Project / Namespace Level

Per OpenShift 4.14: Creating project-wide node selectors, the control of Pod creation may not be available in the Project or Namespace. This behavior leaves the customer without control over Pod placement. Kubernetes and OpenShift provide control over Pod placement when the control over the Pod definition is not possible.

Kubernetes enables this feature through the Namespace annotation scheduler.alpha.kubernetes.io/node-selector. You can read more about internal-behavior.

You can annotate the namespace:

oc annotate ns example scheduler.alpha.kubernetes.io/node-selector=kubernetes.io/arch=ppc64le

OpenShift enables this feature through Namespace annotation.

oc annotate ns example openshift.io/node-selector=kubernetes.io/arch=ppc64le

These direct the Pod to the right node architecture.

Cluster Level

Per OpenShift 4.14: Creating default cluster-wide node selectors, the control of Pod creation may not be available or there is a need for a default. This customer controls Pod placement through a default setting of the cluster-wide default node selector.

To configure the cluster-wide default, patch the Scheduler Operator custom resource (CR).

oc patch Scheduler cluster --type=merge --patch '{"spec": { "defaultNodeSelector": "kubernetes.io/arch=ppc64le" } }'

To direct scheduling to the other pair of architectures, you MUST define a nodeSelector to override the behavior.

Summary

You have seen how to control the distribution of work and how to schedule work with multiple architectures.

In a future blog, I’ll cover Multiarch Manager Operator source which aims to aims to address problems and usability issues encountered when working with Openshift clusters with multi-architecture compute nodes.


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.