Tag: openshift

Setting up nfs-provisioner on OpenShift on Power Systems

Here are my notes for setting up the SIG’s nfs-provisioner. You should follow these directions to setup the nfs-provisioner kubernetes-sigs/nfs-subdir-external-provisioner.

Clone the nfs-subdir-external-provisioner

git clone https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner.git

If you haven’t already, you may need to create the nfs-provisioner namespace.

a. Create the ns.yaml

apiVersion: v1
kind: Namespace
metadata:
  labels:
    kubernetes.io/metadata.name: nfs-provisioner
    pod-security.kubernetes.io/enforce: privileged
    pod-security.kubernetes.io/enforce-version: v1.24
  name: nfs-provisioner

b. create the namespace

oc apply -f ns.yaml

c. annotate the namespace

oc label namespace/nfs-provisioner security.openshift.io/scc.podSecurityLabelSync=false --overwrite=true
oc label namespace/nfs-provisioner pod-security.kubernetes.io/enforce=privileged --overwrite=true
oc label namespace/nfs-provisioner pod-security.kubernetes.io/audit=privileged --overwrite=true
oc label namespace/nfs-provisioner pod-security.kubernetes.io/warn=privileged --overwrite=true

Change to the deploy/ directory

cd nfs-subdir-external-provisioner/deploy

Update the namespace default to nfs-provisioner for deployment.yaml
On the Bastion server, look at ocp4-helpernode/helpernode_vars.yaml for the helper.ipaddr value.

helper:
  networkifacename: env3
  name: "bastion-0"
  ipaddr: "193.168.200.15"

Update the deployment with the NFS_SERVER using the helper.ipaddr and the NFS_PATH /export. It should look like the following:

    spec:
      serviceAccountName: nfs-client-provisioner
      containers:
        - name: nfs-client-provisioner
          image: k8s.gcr.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2
          volumeMounts:
            - name: nfs-client-root
              mountPath: /persistentvolumes
          env:
            - name: PROVISIONER_NAME
              value: k8s-sigs.io/nfs-subdir-external-provisioner
            - name: NFS_SERVER
              value: 193.168.200.15
            - name: NFS_PATH
              value: /export
      volumes:
        - name: nfs-client-root
          nfs:
            server: 193.168.200.15
            path: /export

v4.0.2 supports ppc64le.

Be sure to remove the namespace: default

Create the deployment

oc apply -f deployment.yaml
deployment.apps/nfs-client-provisioner created

Get the pods

oc get pods
NAME                                     READY   STATUS    RESTARTS   AGE
nfs-client-provisioner-b8764c6bb-mjnq9   1/1     Running   0          36s

Setup Authorization

NAMESPACE=`oc project -q`
sed -i'' "s/namespace:.*/namespace: $NAMESPACE/g" ./rbac.yaml 
oc create -f rbac.yaml
oc adm policy add-scc-to-user hostmount-anyuid system:serviceaccount:$NAMESPACE:nfs-client-provisioner

Create the storage class file

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-client
provisioner: k8s-sigs.io/nfs-subdir-external-provisioner # or choose another name, must match deployment's env PROVISIONER_NAME'
parameters:
  pathPattern: "${.PVC.namespace}/${.PVC.annotations.nfs.io/storage-path}" # waits for nfs.io/storage-path annotation, if not specified will accept as empty string.
  onDelete: delete

Apply the StorageClass

oc apply -f sc.yml

Then you can deploy the PV and PVC files/6_EvictPodsWithPVC_dp.yml

References

git repo

2022-10-18

openshift-install-power – quick notes

FYI: openshift-install-power – this is a small recipe for deploying the latest code with the UPI from master branch @ my repo

git clone https://github.com/ocp-power-automation/openshift-install-power.git
chmod +x openshift-install-powervs
export IBMCLOUD_API_KEY="<<redacted>>"
export RELEASE_VER=latest
export ARTIFACTS_VERSION="master"
export ARTIFACTS_REPO="<<MY REPO>>"
./openshift-install-powervs setup
./openshift-install-powervs create -var-file mon01-20220930.tfvars -flavor small -trace

This also recover from errors in ocp4-upi-powervs/terraform

2022-10-17

Topology Manager and OpenShift/Kubernetes
I recently had to work with the Kubernetes Topology Manager and OpenShift. Here is a braindump on Topology Manager:

If the Topology Manager – Feature Gate is enabled, then any active HintProviders are registered to the TopologyManager.

If the CPU Manager and feature gate are enabled, then the CPU Manager can be used to help workloads which are sensitive to CPU throttling, context switches, cache misses, require hyperthreads on same physical CPU core, low latency, and benefit from shared processor resources. The manager has two policies none and static which registers a NOP provider or statically locks the container to a set of CPUs.

If the Memory Manager and feature gate are enabled, then the MemoryManager can be used to process independently of the CPU Manager – e.g. allocate HugePages or guarnteed memory.

If Device Plugins are enabled, then it can be turned on to allocate Devices next to NUMA node resources (e.g., SR-IOV NICs). This may be used independent of the typical CPU/Memory management for GPUs and other machine devices.

Generally, these are all used together to generate a BitMask that admits a pod using a best-effort, restricted, or single-numa-node policy.

An important limitation is the Maximum Number of NUMA nodes is hard-coded to 8. When there are more than eight NUMA nodes, it’ll error out when assigning to the topology. The reason for this is related to state explosion and computational complexity.
1. Check the worker nodes CPU if the NUMA returns 1, it’s a single NUMA node. If it returns 2 or more, it’s multiple NUMA nodes.
```
sh-4.4# lscpu | grep 'NUMA node(s)'
NUMA node(s):        1
```
The kubernetes/enhancements repo contains great detail on the flows and weaknesses of the TopologyManager.

To enable the Topology Manager, one uses Feature Gates:
And OpenShift prefers the FeatureSet LatencySensitive
1. Via FeatureGate
```
$ oc patch featuregate cluster -p '{"spec": {"featureSet": "LatencySensitive"}}' --type merge
```
Which turns on the basic TopologyManager /etc/kubernetes/kubelet.conf
```
  "featureGates": {
    "APIPriorityAndFairness": true,
    "CSIMigrationAzureFile": false,
    "CSIMigrationvSphere": false,
    "DownwardAPIHugePages": true,
    "RotateKubeletServerCertificate": true,
    "TopologyManager": true
  },
```
1. Create a custom KubeletConfig, this allows targeted TopologyManager feature enablement.
file: cpumanager-kubeletconfig.yaml
```
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: cpumanager-enabled
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: cpumanager-enabled
  kubeletConfig:
     cpuManagerPolicy: static 
     cpuManagerReconcilePeriod: 5s 
```
```
$ oc create -f cpumanager-kubeletconfig.yaml
```
Net: They can be used independent of each other. They should be turned on at the same time to maximize the benefits.

There are some examples and test cases out there for Kubernetes and OpenShift
1. Red Hat Sys Engineering Team Test cases for Performance Addon Operator which is now the Cluster Node Tuning Operator– These are the clearest tests, which apply directly to the Topology Manager.
2. Kube Test Cases
  
  Topology Manager
  
  CPU Manager
  
  Device Plugin If you already do SRIOV testing, this should be implicitly covered.
  
  Memory manager
  
  Test Cases Matrix from Kubernetes PR #83481
This is one of the best examples k8stopologyawareschedwg/sample-device-plugin.

Tools to know about
1. GitHub: numalign (amd64) – you can download this in the releases. In this fork prb112/numalign I added ppc64le to the build
2. numactl and numastat are superbly helpful to see the topology spread on a node link to a handy pdf on numa I’ve been starting up a fedora container with numactl and numastat installed
Final note, I had written down that fedora is a great combination with taskset and numactl if you copy in the binaries. I think I used Fedora 35/36 as a container. link

Yes. I built a Hugepages hungry container Hugepages. I also looked at hugepages_tests.go and the test plan.

When it came down to it, I used my hunger container with the example.

I hope this helps others as they start to work with Topology Manager.

References

Red Hat
1. Red Hat Topology Aware Scheduling in Kubernetes Part 1: The High Level Business Case
2. Red Hat Topology Awareness in Kubernetes Part 2: Don’t we already have a Topology Manager?
OpenShift
Kubernetes
Kubernetes Enhancement
1. KEP-693: Node Topology Manager e2e tests: Link
2. KEP-2625: CPU Manager e2e tests: Link
3. KEP-1769: Memory Manager Source: Link PR: Link
2022-10-14
Switching to use Kubernetes with Flannel on RHEL on P10
I needed to switch from calico to flannel. Here is the recipe I followed to setting up Kubernetes 1.25.2 on a Power 10 using Flannel.
Switching to use Kubernetes with Flannel on RHEL on P10
1. Connect to both VMs (in split terminal)
```
ssh root@control-1
ssh root@worker-1
```
1. Run Reset (acknowledge that you want to proceed)
```
kubeadm reset
```
1. Remove Calico
```
rm /etc/cni/net.d/10-calico.conflist 
rm /etc/cni/net.d/calico-kubeconfig
iptables-save | grep -i cali | iptables -F
iptables-save | grep -i cali | iptables -X 
```
1. Initialize the cluster
```
kubeadm init --cri-socket=unix:///var/run/crio/crio.sock --pod-network-cidr=192.168.0.0/16
```
1. Setup kubeconfig
```
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
```
1. Add the plugins:
```
curl -O https://github.com/containernetworking/plugins/releases/download/v1.1.1/cni-plugins-linux-ppc64le-v1.1.1.tgz -L
cp cni-plugins-linux-ppc64le-v1.1.1.tgz /opt/cni/bin
cd /opt/cni/bin
tar xvfz cni-plugins-linux-ppc64le-v1.1.1.tgz 
chmod +x /opt/cni/bin/*
cd ~
systemctl restart crio kubelet
```
1. Download https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
2. Edit the containers to point to the right instance, per the notes in the yaml to the ppc64le manifests
3. Update net-conf.json
```
  net-conf.json: |
    {
      "Network": "192.168.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }
```
1. Join the Cluster
kubeadm join 1.1.1.1:6443 –token y004bg.sc65cp7fqqm7ladg
–discovery-token-ca-cert-hash sha256:1c32dacdf9b934b7bbd6d13fde9312a35709e2f5849008acec8f597eb5a5dad9
1. Add role to the workers
```
kubectl label node worker-01.ocp-power.xyz node-role.kubernetes.io/worker=worker
```
Ref: https://gist.github.com/rkaramandi/44c7cea91501e735ea99e356e9ae7883 Ref: https://www.buzzwrd.me/index.php/2022/02/16/calico-to-flannel-changing-kubernetes-cni-plugin/
2022-10-07

Operator Doesn’t Install Successfully: How to restart it

You see there is an issue with the unpacking your operator in the Operator Hub.

Recreate the Job that does the download by recreating the job and subscription.

Find the Job (per RH 6459071)

$ oc get job -n openshift-marketplace -o json | jq -r '.items[] | select(.spec.template.spec.containers[].env[].value|contains ("myop")) | .metadata.name'

2. Reset the download the Job

for i in $(oc get job -n openshift-marketplace -o json | jq -r '.items[] | select(.spec.template.spec.containers[].env[].value|contains ("myop")) | .metadata.name'); do
  oc delete job $i -n openshift-marketplace; 
  oc delete configmap $i -n openshift-marketplace; 
done

3. Recreate your Subscription and you’ll see more details on the Job’s failure. Keep an eagle eye on the updates as it rolls over quickly.

Message: rpc error: code = Unknown desc = pinging container registry registry.stage.redhat.io: Get "https://xyz/v2/": x509: certificate signed by unknown authority.

You’ve seen how to restart the download/pull through job.

2022-08-26

IBM Cloud cluster-api: building a CAPI image

Per the IBM Cloud Kubernetes cluster-api provider, I followed the raw instructions with some amendments.

Steps

Provision an Ubuntu 20.04 image.
Update the apt repository

$ apt update

Install the dependencies (more than what’s in the instructions)

$ apt install qemu-kvm libvirt-daemon-system libvirt-clients virtinst cpu-checker libguestfs-tools libosinfo-bin make git unzip ansible python3-pip

Clone the image-builder repo

$ git clone https://github.com/kubernetes-sigs/image-builder.git

Change to the capi image

$ cd image-builder/images/capi

Make the deps-raw to confirm everything is working.

$ make deps-raw

Create the ubuntu-2004 image.

$ make build-qemu-ubuntu-2004

Once complete you’ll see:

==> qemu: Running post-processor: custom-post-processor (type shell-local)
==> qemu (shell-local): Running local shell script: /tmp/packer-shell078717884
Build 'qemu' finished after 12 minutes 8 seconds.

==> Wait completed after 12 minutes 8 seconds

==> Builds finished. The artifacts of successful builds are:
--> qemu: VM files in directory: ./output/ubuntu-2004-kube-v1.22.9
--> qemu: VM files in directory: ./output/ubuntu-2004-kube-v1.22.9

Append the .qcow2 extension

$ mv ./output/ubuntu-2004-kube-v1.22.9/ubuntu-2004-kube-v1.22.9 ./output/ubuntu-2004-kube-v1.22.9/ubuntu-2004-kube-v1.22.9.qcow2

You can now upload the output to IBM Cloud Object Storage.

A couple quick tips:

If you see any warnings, you can get advanced details using export PACKER_LOG=1 which puts out the full packer logging. see Packer
KVM module not found indicates you are running in a nested KVM, you’ll have to swap out of the VM and enable nested KVM. Fedora: Docs
Adding a VM to VPC is documented here Console: customImage

2022-08-10

IBM Power Developer eXchange – An opportunity to connect likeminds

There is a new IBM Power Developer eXchange where you can connect with the team I’m a part of to discuss OpenShift on Power or Kubernetes on Power. It’s an avenue to talk directly to the Subject Matter Experts in an open arena.

Are you interested in furthering the development of open source applications on IBM Power? JOIN the IBM Power Developer eXchange to access numerous resources and expand your knowledge. https://ibm.biz/power-developer #PDeX #PowerSystems #Linux #OSS

2022-08-09

Downloading pvsadm and getting VIP details

pvsadm is an unsupported tool that helps with Power Virtual Server administration. I needed this detail for my CAPI tests.

Get the latest download_url per StackOverflow

$ curl -s https://api.github.com/repos/ppc64le-cloud/pvsadm/releases/latest | grep browser_download_url | cut -d '"' -f 4
...
https://github.com/ppc64le-cloud/pvsadm/releases/download/v0.1.7/pvsadm-linux-ppc64le
...

Download the pvsadm tool using the url from above.

$ curl -o pvsadm -L https://github.com/ppc64le-cloud/pvsadm/releases/download/v0.1.7/pvsadm-linux-ppc64le
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 21.4M  100 21.4M    0     0  34.9M      0 --:--:-- --:--:-- --:--:-- 34.9M

Make the pvsadm tool executable

$ chmod +x pvsadm

Create the API Key at https://cloud.ibm.com/iam/apikeys
On the terminal, export the IBMCLOUD_API_KEY.

$ export IBMCLOUD_API_KEY=...REDACTED...

Grab the details of your network VIP using your service name and network.

$ ./pvsadm get ports --instance-name demo --network topman-pub-net
I0808 10:41:26.781531  125151 root.go:49] Using an API key from IBMCLOUD_API_KEY environment variable
+-------------+----------------+----------------+-------------------+--------------------------------------+--------+
| DESCRIPTION |   EXTERNALIP   |   IPADDRESS    |    MACADDRESS     |                PORTID                | STATUS |
+-------------+----------------+----------------+-------------------+--------------------------------------+--------+
|             | 1.1.1.1        | 2.2.2.2        | aa:24:7c:5d:cb:bb | aaa-bbb-ccc-ddd-eee                  | ACTIVE |
+-------------+----------------+----------------+-------------------+--------------------------------------+--------+

2022-08-08

PowerVS: Grabbing a VM Instance Console

Create the API Key at https://cloud.ibm.com/iam/apikeys
On the terminal, export the IBMCLOUD_API_KEY.

$  export IBMCLOUD_API_KEY=...REDACTED...

$ ibmcloud login --apikey "${IBMCLOUD_API_KEY}" -r ca-tor
API endpoint: https://cloud.ibm.com
Authenticating...
OK

Targeted account Demo <-> 1012

Targeted region ca-tor

Users of 'ibmcloud login --vpc-cri' need to use this API to login until July 6, 2022: https://cloud.ibm.com/apidocs/vpc-metadata#create-iam-token
                      
API endpoint:      https://cloud.ibm.com   
Region:            ca-tor   
User:              myuser@us.ibm.com   
Account:           Demo <-> 1012   
Resource group:    No resource group targeted, use 'ibmcloud target -g RESOURCE_GROUP'   
CF API endpoint:      
Org:                  
Space:

List your PowerVS services

$ ibmcloud pi sl
Listing services under account Demo as user myuser@us.ibm.com...
ID                                                                                                                   Name   
crn:v1:bluemix:public:power-iaas:mon01:a/999999c1f1c29460e8c2e4bb8888888:ADE123-8232-4a75-a9d4-0e1248fa30c6::     demo-service

Target your PowerVS instance

$ ibmcloud pi st crn:v1:bluemix:public:power-iaas:mon01:a/999999c1f1c29460e8c2e4bb8888888:ADE123-8232-4a75-a9d4-0e1248fa30c6::

List the PowerVS Services’ VMs

$ ibmcloud pi ins                                                  
Listing instances under account Demo as user myuser@us.ibm.com...
ID                                     Name                                   Path   
12345-ae8f-494b-89f3-5678   control-plane-x       /pcloud/v1/cloud-instances/abc-def-ghi-jkl/pvm-instances/12345-ae8f-494b-89f3-5678

Create a Console for the VM instance you want to look at:

$ ibmcloud pi ingc control-plane-x
Getting console for instance control-plane-x under account Demo as user myuser@us.ibm.com...
                 
Name          control-plane-x   
Console URL   https://mon01-console.power-iaas.cloud.ibm.com/console/index.html?path=%3Ftoken%3not-real

Click on the Console URL, and view in your browser. it can be very helpful.

I was able to diagnose that I had the wrong reference image.

2022-08-08

Pause: Use this one, not that one.
The Red Hat Ecosystem Catalog contains a supported version of the pause container. This container is based on ubi8. This best version of the Pause container to use for multiarch purposes.

Don’t use docker.io/ibmcom/pause-ppc64le:3.1 when you have a multi-architecture version

Steps
1. Create a Pod yaml pointing to the Red Hat registry.
```
$ cat << EOF > pod.yaml 
kind: Pod
apiVersion: v1
metadata:
  name: demopod-1
  labels:
    demo: foo
spec:
  containers:
  - name: pause
    image: registry.access.redhat.com/ubi8/pause:latest
EOF
```
1. Create the Pod
```
$ oc apply -f pod.yaml 
pod/demopod-1 created
```
1. Check the Pod is running.
```
$ oc get pods -l demo=foo
NAME        READY   STATUS    RESTARTS   AGE
demopod-1   1/1     Running   0          89s
```
You have a Pause container running in OpenShift.
2022-08-03

Tag: openshift

References

Tools to know about

References

Switching to use Kubernetes with Flannel on RHEL on P10

Steps

Steps