Blog

etcdctl hacks

If you are running etcd, and need to check a few thing / see the status of your cluster, use the included hacks.

Check the Endpoint Status and DB Size

If you want to see some key details for your cluster, you can run the etcdctl:

$ etcdctl -w table endpoint status
+----------------------+------------------+-------+---------+-------+-------+---+---------+-----+--------+
|        ENDPOINT      |        ID        |VERSION| DB SIZE |LEADER |LEARNER|RT |RAFTINDEX|RAFT APPLIED INDEX | ERRORS |
+----------------------+------------------+-------+---------+-------+-------+---+---------+-----+--------+
| https://1.2.3.3:2379 | e97ca8fed9268702 | 3.5.3 |  128 MB | false | false | 8 | 2766616 | 2766616 |   |
| https://1.2.3.2:2379 | 82c75b78b63b558b | 3.5.3 |  127 MB |  true | false | 8 | 2766616 | 2766616 |   |
| https://1.2.3.1:2379 | afa5e0b54513b116 | 3.5.3 |  134 MB | false | false | 8 | 2766616 | 2766616 |   |
+----------------------+------------------+-------+---------+-------+-------+---+---------+-----+--------+

Check the Revision and Count for All the Keys

If you need to see how many keys you have, you can execute the following command, and you get 5061 keys.

 $ etcdctl get / --prefix --count-only=true --write-out=fields
"ClusterID" : 1232296676125618033
"MemberID" : 9423601319307597195
"Revision" : 2712993
"RaftTerm" : 8
"More" : false
"Count" : 5061

Check the 5 Highest ModRevisions for a Key/Value

If you need to find the Highest utilized (Updated), keys you can use this hack:

$ for KEY in $(etcdctl get / --prefix --keys-only=true | grep -v leases)
do 
    if [ ! -z "${KEY}" ]
    then 
        COUNT=$(etcdctl get ${KEY} --prefix --write-out=fields | grep \"ModRevision\" | awk '{print $NF}')
        echo "${COUNT} ${KEY}"
    fi
done | sort -nr | head -n 5

2732087 /kubernetes.io/validatingwebhookconfigurations/performance-addon-operator
2731785 /kubernetes.io/resourcequotas/openshift-host-network/host-network-namespace-quotas
2731753 /kubernetes.io/validatingwebhookconfigurations/multus.openshift.io
2731549 /kubernetes.io/network.openshift.io/clusternetworks/default
2731478 /kubernetes.io/configmaps/openshift-service-ca/service-ca-controller-lock

Calculating the Theoretical Memory Pressure

Per the website, you can calculate memory pressure as:

The theoretical memory consumption of watch can be approximated with the formula: memory = c1 * number_of_conn + c2 * avg_number_of_stream_per_conn + c3 * avg_number_of_watch_stream
etcd benchmark site

Command to be added in the future

References

etcd Cheat Sheet
GitHub etcd
etcd Metrics such as etcd_debugging_mvcc_total_put_size_in_bytes
etcd benchmarks

2022-06-24

Analyzing Memory Hacks with Kubernetes and OpenShift and Linux

I’ve had to run a number of queries for an OpenShift Cluster, Kubernetes and Linux recently, and here are my helpful queries:

Node Memory

If you want to check the Memory on each Node in your OpenShift Cluster, you can run the following oc command:

$ oc get nodes -o json | jq -r '.items[] | "\(.metadata.name) - \(.status.capacity.memory)"'
master-0.ocp-power.xyz - 16652928Ki
master-1.ocp-power.xyz - 16652928Ki
master-2.ocp-power.xyz - 16652928Ki
worker-0.ocp-power.xyz - 16652928Ki
worker-1.ocp-power.xyz - 16652928Ki

Node Memory Pressure

If you want to check the Memory usage on each Node in your OpenShift Cluster, you can run the following oc command:

Memory Pressure Per node:
$ oc adm top node --show-capacity=true
NAME  CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%  
master-0.ocp-power.xyz    1894m 15272Mi 93%
master-1.ocp-power.xyz    1037m 8926Mi  54%
master-2.ocp-power.xyz    1563m 10953Mi 67%
worker-0.ocp-power.xyz    1523m 6781Mi  41%
worker-1.ocp-power.xyz    933m  6746Mi  41%

Top Memory Usage per Pods

If you want to check the Top Memory Usage per Pod, you can run the following command:

$ oc adm top pod -A --sort-by='memory'
 NAMESPACE  NAME CPU(cores)   MEMORY(bytes)  
15% - openshift-kube-apiserver   kube-apiserver-master-2.ocp-power.xyz   386m 2452Mi 
11% - openshift-kube-apiserver   kube-apiserver-master-0.ocp-power.xyz   225m 1924Mi 
10% - openshift-kube-apiserver   kube-apiserver-master-1.ocp-power.xyz   239m 1720Mi

List Container Memory Details per Pod

If you want to see the breakdown of Memory usage, you can use the following kubectl command:

$ kubectl top pod kube-apiserver-master-2.ocp-power.xyz -n openshift-kube-apiserver --containers
POD  NAME  CPU(cores)   MEMORY(bytes)  
kube-apiserver-master-2.ocp-power.xyz   POD   0m   0Mi
kube-apiserver-master-2.ocp-power.xyz   kube-apiserver514m 2232Mi 
kube-apiserver-master-2.ocp-power.xyz   kube-apiserver-cert-regeneration-controller   25m  51Mi   
kube-apiserver-master-2.ocp-power.xyz   kube-apiserver-cert-syncer0m   28Mi   
kube-apiserver-master-2.ocp-power.xyz   kube-apiserver-check-endpoints7m   41Mi   
kube-apiserver-master-2.ocp-power.xyz   kube-apiserver-insecure-readyz0m   16Mi

Checking the High and Low Memory Limits on a Linux Host

If you want to check the memory usage on a host in Gigabytes (including the max allocation), you can run the free command:

$ free -g -h -l
              total        used        free      shared  buff/cache   available
Mem:           15Gi        10Gi       348Mi       165Mi       5.2Gi       4.8Gi
Low:           15Gi        15Gi       348Mi
High:            0B          0B          0B
Swap:            0B          0B          0B

Use Observe > Metrics

If you have Metrics enabled, login to your OpenShift Dashboard, and click on Observe > Metrics and use one of the following

sum(node_memory_MemAvailable_bytes) by (instance) / 1024 / 1024 / 1024
:node_memory_MemAvailable_bytes:sum

Check Memory Usage on the CoreOS Nodes

If you want to check the memory details on each CoreOS node, you can use the following hack to SSH in and output the details.

$ for HN in $(oc get nodes -o json | jq -r '.items[].status.addresses[] | select(.type=="Hostname").address')
do
   echo HOSTNAME: $HN
   ssh core@$HN 'cat /proc/meminfo'
done

HOSTNAME: master-0.ocp-power-xyz
MemTotal:       16652928 kB
MemFree:          265472 kB
MemAvailable:     933248 kB
Buffers:             384 kB
Cached:          1387584 kB
SwapCached:            0 kB
Active:           688000 kB
Inactive:        7832192 kB
Active(anon):     120448 kB
Inactive(anon):  7307392 kB

Check Top in Batch on the CoreOS Nodes

If you want to check the Memory using Top (batch) on each CoreOS node, you can use the following hack to SSH in and output the details: (refer to link)

for HN in $(oc get nodes -o json | jq -r '.items[].status.addresses[] | select(.type=="Hostname").address')
do
   echo
   echo HOSTNAME: $HN
   ssh core@$HN 'top -b -d 5 -n 1 -E g -o +%MEM'
   sleep 10
done

HOSTNAME: master-0.ocp-power.xyz
top - 23:41:40 up 7 days, 11:58,  0 users,  load average: 1.60, 2.24, 2.74
Tasks: 390 total,   1 running, 389 sleeping,   0 stopped,   0 zombie
%Cpu(s): 48.1 us, 10.6 sy,  0.0 ni, 39.4 id,  1.2 wa,  0.0 hi,  0.6 si,  0.0 st
GiB Mem :     15.9 total,      0.2 free,     14.3 used,      1.3 buff/cache
GiB Swap:      0.0 total,      0.0 free,      0.0 used.      0.8 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
1018800 root      20   0 2661824   1.7g  21696 S   0.0  10.5   2896:40 kube-ap+
  42247 root      20   0   11.3g   1.1g  14272 S  16.7   7.1   1483:55 etcd
   1704 root      20   0 3984384 220480  31680 S  11.1   1.3   3289:24 kubelet

Pod Metrics (Thanks StackOverflow)

If you want to get raw cpu and memory metrics from OpenShift, you can run the following:

kubectl -n openshift-etcd get --raw /apis/metrics.k8s.io/v1beta1/namespaces/openshift-etcd/pods/etcd-master-0.ocp-power.xyz | jq
{
  "kind": "PodMetrics",
  "apiVersion": "metrics.k8s.io/v1beta1",
  "metadata": {
    "name": "etcd-master-0.ocp-power.xyz",
    "namespace": "openshift-etcd",
    "creationTimestamp": "2022-06-25T00:10:58Z",
    "labels": {
      "app": "etcd",
      "etcd": "true",
      "k8s-app": "etcd",
      "revision": "7"
    }
  },
  "timestamp": "2022-06-25T00:10:58Z",
  "window": "5m0s",
  "containers": [
    {
      "name": "etcd",
      "usage": {
        "cpu": "142m",
        "memory": "1197632Ki"
      }
    },
    {
      "name": "etcd-health-monitor",
      "usage": {
        "cpu": "42m",
        "memory": "34240Ki"
      }
    },
    {
      "name": "etcd-metrics",
      "usage": {
        "cpu": "28m",
        "memory": "17920Ki"
      }
    },
    {
      "name": "etcd-readyz",
      "usage": {
        "cpu": "8m",
        "memory": "31680Ki"
      }
    },
    {
      "name": "etcdctl",
      "usage": {
        "cpu": "0",
        "memory": "3200Ki"
      }
    },
    {
      "name": "POD",
      "usage": {
        "cpu": "0",
        "memory": "0"
      }
    }
  ]
}

List Pods on a Node

The following lists every Pod on a Node and outputs the namespace and pod name:

$ oc get pods -A -o json --field-selector spec.nodeName=worker-0.ocp-power.xyz | jq -r '.items[] | "\(.metadata.namespace) / \(.metadata.name)"'
openshift-cluster-node-tuning-operator / tuned-fdpl4
openshift-dns / dns-default-6vdcw

2022-06-24

OpenShift Node Feature Discovery Operator – Advanced Configuration

My teammate and I setup OpenShift: Node Feature Discovery. We wanted to load labels for specific use-cases. I stumbled across Node Feature Discovery: Advanced Configuration, so I edited the NodeFeatureDiscovery resource – nfd-instance.

$ oc -n openshift-nfd edit NodeFeatureDiscovery nfd-instance 
nodefeaturediscovery.nfd.openshift.io/nfd-instance edited

I added to spec.customConfig.klog.addDirHeader and spec.customConfig.klog.v so we can get some nice logging. You’ll want to use the number 3

spec:
  customConfig:
    configData: |
        klog:
          addDirHeader: false
          v: 3

No restart necessary until you start editing the configData. Once you do change the configuration, you’ll see (using oc -n openshift-nfd logs nfd-worker-4v2pw):

I0623 14:19:58.522028       1 memory.go:134] No NVDIMM devices present
I0623 14:39:47.570487       1 memory.go:99] discovered memory features:
I0623 14:39:47.570781       1 memory.go:99]   Instances:
I0623 14:39:47.570791       1 memory.go:99]     nv:
I0623 14:39:47.570798       1 memory.go:99]       Elements: []
I0623 14:39:47.570805       1 memory.go:99]   Keys: {}
I0623 14:39:47.570812       1 memory.go:99]   Values:
I0623 14:39:47.570819       1 memory.go:99]     numa:
I0623 14:39:47.570826       1 memory.go:99]       Elements:
I0623 14:39:47.570833       1 memory.go:99]         is_numa: "false"
I0623 14:39:47.570840       1 memory.go:99]         node_count: "1"
I0623 14:19:58.522052       1 nfd-worker.go:472] starting feature discovery...
I0623 14:19:58.522062       1 nfd-worker.go:484] feature discovery completed
I0623 14:19:58.522079       1 nfd-worker.go:485] labels discovered by feature sources:
I0623 14:19:58.522141       1 nfd-worker.go:485]   {}
I0623 14:19:58.522155       1 nfd-worker.go:565] sending labeling request to nfd-master

You can check your labels at:

$ oc get node -o json  | jq -r '.items[].metadata.labels'
...
{
  "beta.kubernetes.io/arch": "ppc64le",
  "beta.kubernetes.io/os": "linux",
  "cpumanager": "enabled",
  "kubernetes.io/arch": "ppc64le",
  "kubernetes.io/hostname": "worker-1.xip.io",
  "kubernetes.io/os": "linux",
  "node-role.kubernetes.io/worker": "",
  "node.openshift.io/os_id": "rhcos"
}

I then update the configData with labelSources and featureSources so it is using all,-memory, without the memory source. You can see the list of sources at GitHub: node-feature-discovery/source and Feature Discovery: sources

spec:
  workerConfig:
    configData: |
      core:
        labelSources: [-memory,all]
        featureSources: [-memory,all]

Note: There is an additional configuration example at link.

Check your labels and see your labels are there.

$ oc get node -o json  | jq -r '.items[].metadata.labels'
{
  "beta.kubernetes.io/arch": "ppc64le",
  "beta.kubernetes.io/os": "linux",
  "cpumanager": "enabled",
  "feature.node.kubernetes.io/cpu-cpuid.ALTIVEC": "true",
  "feature.node.kubernetes.io/cpu-cpuid.ARCHPMU": "true",
  "feature.node.kubernetes.io/cpu-cpuid.ARCH_2_06": "true",
  "feature.node.kubernetes.io/cpu-cpuid.ARCH_2_07": "true",
  "feature.node.kubernetes.io/cpu-cpuid.ARCH_3_00": "true",
  "feature.node.kubernetes.io/cpu-cpuid.DARN": "true",
  "feature.node.kubernetes.io/cpu-cpuid.DFP": "true",
  "feature.node.kubernetes.io/cpu-cpuid.DSCR": "true",
  "feature.node.kubernetes.io/cpu-cpuid.EBB": "true",
  "feature.node.kubernetes.io/cpu-cpuid.FPU": "true",
  "feature.node.kubernetes.io/cpu-cpuid.HTM": "true",
  "feature.node.kubernetes.io/cpu-cpuid.HTM-NOSC": "true",
  "feature.node.kubernetes.io/cpu-cpuid.IC_SNOOP": "true",
  "feature.node.kubernetes.io/cpu-cpuid.IEEE128": "true",
  "feature.node.kubernetes.io/cpu-cpuid.ISEL": "true",
  "feature.node.kubernetes.io/cpu-cpuid.MMU": "true",
  "feature.node.kubernetes.io/cpu-cpuid.PPC32": "true",
  "feature.node.kubernetes.io/cpu-cpuid.PPC64": "true",
  "feature.node.kubernetes.io/cpu-cpuid.SMT": "true",
  "feature.node.kubernetes.io/cpu-cpuid.TAR": "true",
  "feature.node.kubernetes.io/cpu-cpuid.TRUE_LE": "true",
  "feature.node.kubernetes.io/cpu-cpuid.VCRYPTO": "true",
  "feature.node.kubernetes.io/cpu-cpuid.VSX": "true",
  "feature.node.kubernetes.io/cpu-hardware_multithreading": "true",
  "feature.node.kubernetes.io/kernel-config.NO_HZ": "true",
  "feature.node.kubernetes.io/kernel-config.NO_HZ_FULL": "true",
  "feature.node.kubernetes.io/kernel-selinux.enabled": "true",
  "feature.node.kubernetes.io/kernel-version.full": "4.18.0-305.45.1.el8_4.ppc64le",
  "feature.node.kubernetes.io/kernel-version.major": "4",
  "feature.node.kubernetes.io/kernel-version.minor": "18",
  "feature.node.kubernetes.io/kernel-version.revision": "0",
  "feature.node.kubernetes.io/system-os_release.ID": "rhcos",
  "feature.node.kubernetes.io/system-os_release.OPENSHIFT_VERSION": "4.10",
  "feature.node.kubernetes.io/system-os_release.OSTREE_VERSION": "410.84.202205120749-0",
  "feature.node.kubernetes.io/system-os_release.RHEL_VERSION": "8.4",
  "feature.node.kubernetes.io/system-os_release.VERSION_ID": "4.10",
  "feature.node.kubernetes.io/system-os_release.VERSION_ID.major": "4",
  "feature.node.kubernetes.io/system-os_release.VERSION_ID.minor": "10",
  "kubernetes.io/arch": "ppc64le",
  "kubernetes.io/hostname": "worker1.xip.io",
  "kubernetes.io/os": "linux",
  "node-role.kubernetes.io/worker": "",
  "node.openshift.io/os_id": "rhcos"
}

2022-06-23

What to do when you see “Application is not available” on the OpenShift Console

This post helps those who are stuck with “Application is not available” on the OpenShift Console on IBM Virtual Private Cloud (VPC).

First, when you access the OpenShift Console you’ll see https://console-openshift-console.hidden.eu-gb.containers.appdomain.cloud/dashboards

Steps

Find your worker nodes

$ oc get nodes -l node-role.kubernetes.io/worker
NAME         STATUS   ROLES           AGE   VERSION
worker0   Ready    master,worker   28h   v1.23.5+3afdacb
worker1   Ready    master,worker   28h   v1.23.5+3afdacb

2. Launch a debug pod to the node/worker0 and execute a chroot, and curl to confirm it times out.

$ oc debug node/worker0                                                               
Starting pod/1024204-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.242.0.4
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
curl google.com -v -k
* About to connect() to google.com port 80 (#0)
*   Trying 216.58.212.238...

If the curl command never completes, then you probably don’t have the VPC set for egress.

3. Navigate to https://cloud.ibm.com/vpc-ext/network/subnet/

4. Find your subnet, click on Public Gateway

5. Retry accessing your Console (You can also retry from the command line oc debug). You should now see the dashboard (note it may need to retry the CrashBackOffLoop for the pod, so it may be a few minutes).

Appendix: Checking your Console URL

If you don’t know your external console URL, you can retrieve it from oc.

$ oc -n openshift-config-managed get cm console-public -o jsonpath='{.data.consoleURL}'
https://console-openshift-console.hidden.eu-gb.containers.appdomain.cloud

Appendix: Checking Access Tokens

If you are using OauthAccessTokens in your environment, and you closed your display, you can always get a view (as a kubeadmin) of the current access tokens using the OpenShift command line.

$ oc get oauthaccesstokens -A              
NAME                                                 USER NAME                        CLIENT NAME                CREATED   EXPIRES                         REDIRECT URI                                                              SCOPES
sha256~-m   IAM#yyy@ibm.com    openshift-browser-client   12m       2022-06-22 15:00:38 +0000 UTC   https://hiddene.eu-gb.containers.cloud.ibm.com:31871/oauth/token/display   user:full
sha256~x   IAM#g@ibm.com    openshift-browser-client   10m       2022-06-22 15:02:24 +0000 UTC   https://hiddene.eu-gb.containers.cloud.ibm.com:31871/oauth/token/display   user:full
sha256~z   IAM#x@us.ibm.com          openshift-browser-client   171m      2022-06-22 12:21:30 +0000 UTC   https://hiddene.eu-gb.containers.cloud.ibm.com:31871/oauth/token/display   user:full
sha256~z   IAM#y@ibm.com   openshift-browser-client   131m      2022-06-22 13:01:18 +0000 UTC   https://hiddene.eu-gb.containers.cloud.ibm.com:31871/oauth/token/display   user:full
sha256~y   IAM#y@ibm.com   openshift-browser-client   84m       2022-06-22 13:48:29 +0000 UTC   https://hiddene.eu-gb.containers.cloud.ibm.com:31871/oauth/token/display   user:full
sha256~x   IAM#y@ibm.com   openshift-browser-client   130m      2022-06-22 13:02:25 +0000 UTC   https://hiddene.eu-gb.containers.cloud.ibm.com:31871/oauth/token/display   user:full

Appendix: Checking the OAuth Well Known

To check the well known oauth endpoints, check https://hidden-e.eu-gb.containers.cloud.ibm.com:30603/.well-known/oauth-authorization-server

2022-06-21

OpenShift on Power: Topology Manager – Hugepages Demonstration

This demonstration shows the Hugepages allocation in OpenShift and some of the finer debug points. You’ll find the example data at link

Note: The allocated Hugepages memory is not deallocated.

$ oc login

List the Nodes

$ oc get nodes -l node-role.kubernetes.io/worker
NAME                    STATUS   ROLES    AGE   VERSION
lon06-worker-0.xip.io   Ready    worker   9d    v1.23.5+3afdacb
lon06-worker-1.xip.io   Ready    worker   9d    v1.23.5+3afdacb

Check one of your worker nodes by starting a terminal session

$ ssh core@lon06-worker-0.xip.io

Check the Hugepagesize to verify it exists

$ grep Hugepagesize /proc/meminfo
Hugepagesize:      16384 kB

Check the vm.nr_hugepages, if it’s zero we’ll need to set it up.

$ sysctl vm.nr_hugepages
vm.nr_hugepages = 0

You can manually set until reboot (you’ll need to do this one each worker) or use a MachineConfig

$ sudo sysctl -w vm.nr_hugepages=256
vm.nr_hugepages = 256

Use a MachineConfig to set vm.nr_hugepages

$ oc apply -f machineconfig.yaml
machineconfig.machineconfiguration.openshift.io/99-sysctl-nr-hugepages created

Wait for an update to the MachineConfig

$ oc wait mcp/worker --for condition=updated --timeout=25m

Create the Hugepages demonstration pod

$ oc apply -f pod.yaml 
pod/hugepages-demo created

Check the hugepages output is correct. You are looking for Page Size 16M and total size.

$ oc exec -it hugepages-demo -- grep hugepages/demo /proc/1/smaps -A3
7efff8000000-7f0000000000 rw-s 00000000 00:1de 197766                    /dev/hugepages/demo
Size:             131072 kB
KernelPageSize:    16384 kB
MMUPageSize:       16384 kB

Find the node the pod is running on:

$ oc get pods -o wide
NAME             READY   STATUS    RESTARTS   AGE     IP       NODE     NOMINATED NODE   READINESS GATES
hugepages-demo   1/1     Running   0          5m30s   10.128.2.12   lon06-worker-1.xip.io   <none> <none>

Check the HugePages value.

$ grep HugePages_ /proc/meminfo
HugePages_Total:      20
HugePages_Free:       17
HugePages_Rsvd:        5
HugePages_Surp:        0

Check the Allocatable HugePages allocation is non-zero

$ oc get node lon06-worker-1.xip.io   -o jsonpath="{.status.allocatable}" | jq -r .  
{
  "cpu": "7500m",
  "ephemeral-storage": "115586611009",
  "hugepages-16Gi": "0",
  "hugepages-16Mi": "4Gi",
  "memory": "28069952Ki",
  "pods": "250"
}

Check that there are allocated Hugepages (on one of the Worker nodes)
Switch to the root user

$ sudo -s

Find the hugepaged process

ps -ef | grep hugepaged

Verify the allocated data

$ PROC=27693
$ grep -A3 'demo' /proc/${PROC}/smaps
7efff8000000-7f0000000000 rw-s 00000000 00:1de 197766                    /dev/hugepages/demo
Size:             131072 kB
KernelPageSize:    16384 kB
MMUPageSize:       16384 kB

Summary

You have seen how to configure a node to support hugepages and deploy a Pod with Hugepages support, and confirm Hugepages are used.

<hr>

References

Appendix: Install Hugepages Tools

Check that hugectl is provided by your package manager

yum whatprovides hugectl

Install the Hugectl tools

yum -y install libhugetlbfs-utils libhugetlbfs

Appendix: Install Build Tools

The minimum build tools required to build this sample project are make and golang.

 yum install -y golang make

Appendix: mmap allocation issues

If you see mmap: Cannot allocate memory, then you should run echo 20 > /proc/sys/vm/nr_hugepages which switches the hugepages to non-zero.

Appendix: Using Tuned to Configure the Node with Alternative Page Sizes

Create the Tuned configuration and wait for a node restart.

Create a tuned.yaml file with the additional page size and the number of pages per the manifests/tuned.yaml

cmdline_openshift_node_hugepages=hugepagesz=2M hugepages=50

Apply the configuration

$ oc apply -f manifests/tuned.yaml 
tuned/hugepages created

Wait while the node is restarted and have fun.

Appendix: huge tools

Install yum install libhugetlbfs-utils -y
Check the mounts

$  hugeadm  --list-all-mounts 
Mount Point                      Options
/dev/hugepages                   rw,seclabel,relatime,pagesize=16M
/var/lib/hugetlbfs/pagesize-16MB rw,seclabel,relatime,pagesize=16M
/var/lib/hugetlbfs/pagesize-16GB rw,seclabel,relatime,pagesize=16384M

Check the Pools and what setting they have

$ hugeadm  --pool-list
      Size  Minimum  Current  Maximum  Default
  16777216       20       20       20        *
17179869184        0        0        0

Start an application written with glibc to transparently use hugepages.

$ hugectl myapp

Note, it does not work with non-glibc apps (e.g. golang)

Create the mounts automatically:

$ hugeadm --create-mounts

And then you can check the mounts

$ mount | grep pagesize
none on /var/lib/hugetlbfs/pagesize-16MB type hugetlbfs (rw,relatime,seclabel,pagesize=16M)
none on /var/lib/hugetlbfs/pagesize-16GB type hugetlbfs (rw,relatime,seclabel,pagesize=16384M)

Appendix: Manual Creation of the Hugepages FS

To mount 64KB pages (if the system hardware supports it):

mkdir -p /mnt/hugetlbfs-16K
mount -t hugetlbfs none -opagesize=16777216 /mnt/hugetlbfs-16K

mkdir /dev/hugepages16G
mount -t hugetlbfs -o pagesize=17179869184 none /dev/hugepages16G

Appendix: Check Hugepage Sizes

Check Hugepage sizes

$ ls /sys/kernel/mm/hugepages
hugepages-16384kB  hugepages-16777216kB

Appendix: List sysctl settings

$ sysctl -a | grep hugepages
vm.nr_hugepages = 642
vm.nr_hugepages_mempolicy = 642
vm.nr_overcommit_hugepages = 676

To allocate our 2048 Huge Pages we can use:

$ echo 2048 > /proc/sys/vm/nr_hugepages

To disable transparent huge pages

echo never > /sys/kernel/mm/transparent_hugepage/enabled

Appendix: Check Hugepage Mount usage for a process

Check Hugepage Mount usage for a process (you need to know the process id and the filesystem name)

$ PROC=4991
$ grep -A3 '/var/lib/hugetlbfs/pagesize-16MB/demo' /proc/${PROC}/smaps
7efff8000000-7f0000000000 rw-s 00000000 00:2f 63535                      /var/lib/hugetlbfs/pagesize-16MB/demo
Size:             131072 kB
KernelPageSize:    16384 kB
MMUPageSize:       16384 kB

Is this a Red Hat or IBM supported solution?

No. This is only a proof of concept that serves as a good starting point to understand how HugePages works in OpenShift.

2022-06-15

Accessing and Using the Internal OpenShift Registry

The following is how to enable the OpenShift Internal Registry on the IBM Cloud’s hosted OpenShift.

$ ibmcloud login --sso

Select your account
Setup OpenShift Cluster access

$ ibmcloud oc cluster config -c rdr-ocp-base-lon06-pdb --admin

Type oc login (it’ll tell you where to request the oauth token)
Get a token at https://XYZ.com:31344/oauth/token/request

$ oc login --token=sha256~aaa --server=https://XYZ.com:31609
Logged into "https://XYZ.com:31609" as "IAM#xyz@us.ibm.com" using the token provided.

You have access to 63 projects, the list has been suppressed. You can list all projects with 'oc projects'

Using project "default".

Setup the external route for the Image Registry

$ oc patch configs.imageregistry.operator.openshift.io/cluster --patch '{"spec":{"defaultRoute":true}}' --type=merge
config.imageregistry.operator.openshift.io/cluster patched

Check the OpenShift Image registry host and you see the hostname printed.

$ oc get route default-route -n openshift-image-registry --template='{{.spec.host }}'
default-route-openshift-image-registry.xyz.cloud

Make the local registry lookup use relative names

$ oc set image-lookup  --all

$ docker login -u $(oc whoami) -p $(oc whoami -t) default-route-openshift-image-registry.xyz.cloud
Login Succeeded

Pull Nginx

$ docker pull nginx

Tag the Image for the Image Registry

$ docker tag nginx:latest default-route-openshift-image-registry.xyz.cloud/$(oc project --short=true)/nginx-int:latest

Push the Image into the OpenShift Image Registry

$ docker push default-route-openshift-image-registry.xyz.cloud/$(oc project --short=true)/nginx-int:latest

Use image-registry.openshift-image-registry.svc:5000/default/nginx-int:latest as the image name in your deployment

$ oc run test-a --image image-registry.openshift-image-registry.svc:5000/default/nginx-int:latest
pod/test-a created

Reference

OpenShift 4.10: Exposing the Image Registry

2022-06-13

OpenShift Kube Descheduler Operator – Profile Examples
For the last few weeks, I’ve been working with the OpenShift Kube Descheduler and OpenShift Kube Descheduler Operator.

I posted some test-cases for the seven Descheduler Profiles to demonstrate how the Profiles operate under specific conditions. Note these are unsupported test cases.

The following examples are:
1. AffinityAndTaints
  
  1_AffinityAndTaints_RemovePodsViolatingInterPodAntiAffinity.md
  
  1_AffinityAndTaints_RemovePodsViolatingNodeTaints.md
  
  1_AffinityAndTaints_RequiredDuringSchedulingIgnoredDuringExecution.md
2. TopologyAndDuplicates
  
  2_TopologyAndDuplicates_RemoveDuplicates.md
  
  2_TopologyAndDuplicates_RemovePodsViolatingTopologySpreadConstraint.md
3. LifecycleAndUtilization
  
  3_LifecycleAndUtilization_LowNodeUtilization.md
  
  3_LifecycleAndUtilization_PodLifeTime-PriorityFiltering.md
  
  3_LifecycleAndUtilization_PodLifeTime.md
  
  3_LifecycleAndUtilization_RemovePodsHavingTooManyRestarts.md
4. SoftTopologyAndDuplicates
  
  4_SoftTopologyAndDuplicates_RemoveDuplicates.md
  
  4_SoftTopologyAndDuplicates_RemovePodsViolatingTopologySpreadConstraint.md
5. EvictPodsWithLocalStorage
  
  5_EvictPodsWithLocalStorage.md
6. EvictPodsWithPVC
  
  6_EvictPodsWithPVC.md
7. DevPreviewLongLifecycle
  
  7_DevPreviewLongLifecycle.md
Summary

I hope this helps you adopt the OpenShift Kube Descheduler Operator.

References
1. Evicting pods using the descheduler
2. Kubernetes: Pod Topology Spread Constraints Use topology spread constraints to control how Pods are spread across your cluster among failure-domains such as regions, zones, nodes, and other user-defined topology domains. This can help to achieve high availability as well as efficient resource utilization.
3. Kubernetes: Inter-pod affinity and anti-affinity Inter-pod affinity and anti-affinity allow you to constrain which nodes your Pods can be scheduled on based on the labels of Pods already running on that node, instead of the node labels.
4. Kubernetes: Well-Known labels and taints
5. Adding Labels to a Running Pod
6. Label Selector for k8s.io
7. Pod Affinity and AntiAffinity Examples
8. Scheduling pods using a scheduler profile
9. Kubernetes: Assigning Pods to Nodes
10. OpenShift 3.11: Advanced Scheduling and Pod Affinity/Anti-affinity
11. Kubernetes: Pod Lifecycle
12. Base Profiles
13. Descheduler User Guide
14. Kubernetes: Scheduling Framework
15. GitHub: openshift/cluster-kube-descheduler-operator
2022-05-18

OpenShift Descheduler Operator: How-To

In OpenShift, the kube-scheduler binds a unit of work (Pod) to a Node. The scheduler reads from a scheduling queue the work, retrieves the current state of the cluster, scores the work based on the scheduling rules (from the policy) and the cluster’s state, and prioritizes binding the Pod to a Node.

These nodes are scheduled based on an instantaneous read of the policy and the environment and a best-estimation placement of the Pod on a Node. With best estimate at the time, these clusters are constantly changing shape and context; there is a need to deschedule and schedule the Pod anew.

There are four actors in the Descheduler:

User configures the KubeDescheduler resource
Operator creates the Descheduler Deployment
Descheduler run on a set interval and re-evaluates the scheduled Pod and Node and Policy, setting an eviction if the Pod should be removed based on the Descheduler Policy.
Pod is removed (unbound).

Thankfully, OpenShift has a Descheduler Operator that more easily facilitates the unbinding of a Pod from a Node based on a cluster-wide configuration of the KubeDescheduler CustomResource. In a single cluster, there is at most one configured KubeDescheduler named cluster (it has to be fixed), and configures one or more Descheduler Profiles.

Descheduler Profiles are predefined and available in the profiles folder – DeschedulerProfile:

AffinityAndTaints	Balance pods based on node taint violations
TopologyAndDuplicates	Spreads pods evenly among nodes based on topology constraints and duplicate replicates on the same node The profile cannot be used with SoftTopologyAndDuplicates.
SoftTopologyAndDuplicates	Spreads pods with prior with soft constraints The profile cannot be used with TopologyAndDuplicates.
LifecycleAndUtilization	Balances pods based on node resource usage This profile cannot be used with DevPreviewLongLifecycle
EvictPodsWithLocalStorage	Enables pods with local storage to be evicted by the descheduler by all other profiles
EvictPodsWithPVC	Prevents pods with PVCs from being evicted by all other profiles
DevPreviewLongLifecycle	Lifecycle management for pods that are ‘long running’ This profile cannot be used with LifecycleAndUtilization

There must be one or more DeschedulerProfile specified, and there cannot be any duplicates entries. There are two possible mode values – Automatic and Predictive. You have to go the Pod to check the output to see what is Predicted or is Completed.

The DeschedulerOperator excludes the openshift-*, kube-system and hypershift namespaces.

Steps

1.   Login to your OpenShift Cluster

oc login --token=sha256~1111-g --server=https://api..sslip.io:6443

2. Create a Pod that indicates it’s available for eviction using the annotation descheduler.alpha.kubernetes.io/evict: “true” and is updated for the proper node name.

cat << EOF > pod.yaml 
kind: Pod
apiVersion: v1
metadata:
  annotations:
    descheduler.alpha.kubernetes.io/evict: "true"
  name: demopod1
  labels:
    foo: bar
spec:
  containers:
  - name: pause
    image: docker.io/ibmcom/pause-ppc64le:3.1
EOF
oc apply -f pod.yaml 
pod/demopod1 created

3. Create the KubeDescheduler CR with a Descheduling Interval of 60 seconds and Pod Lifetime of 1m.

cat << EOF > kd.yaml 
apiVersion: operator.openshift.io/v1
kind: KubeDescheduler
metadata:
  name: cluster
  namespace: openshift-kube-descheduler-operator
spec:
  logLevel: Normal
  mode: Predictive
  operatorLogLevel: Normal
  deschedulingIntervalSeconds: 60
  profileCustomizations:
    podLifetime: 1m0s
  observedConfig:
    servingInfo:
      cipherSuites:
        - TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
        - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
        - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
        - TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
        - TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
        - TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
      minTLSVersion: VersionTLS12
  profiles:
    - LifecycleAndUtilization
  managementState: Managed
EOF
oc apply -f kd.yaml

4. Get the Pods in the openshift-kube-descheduler-operator

oc get pods -n openshift-kube-descheduler-operator                              
NAME                                    READY   STATUS    RESTARTS   AGE
descheduler-f479c5669-5ffxl             1/1     Running   0          2m7s
descheduler-operator-85fc6666cb-5dfr7   1/1     Running   0          27h

5. Check the Logs for the descheduler pod

oc -n openshift-kube-descheduler-operator logs descheduler-f479c5669-5ffxl
I0506 19:59:10.298440       1 pod_lifetime.go:110] "Evicted pod because it exceeded its lifetime" pod="minio-operator/console-7bc65f7dd9-q57lr" maxPodLifeTime=60
I0506 19:59:10.298500       1 evictions.go:158] "Evicted pod in dry run mode" pod="default/demopod1" reason="PodLifeTime"
I0506 19:59:10.298532       1 pod_lifetime.go:110] "Evicted pod because it exceeded its lifetime" pod="default/demopod1" maxPodLifeTime=60
I0506 19:59:10.298598       1 toomanyrestarts.go:90] "Processing node" node="master-0.rdr-rhop-.sslip.io"
I0506 19:59:10.299118       1 toomanyrestarts.go:90] "Processing node" node="master-1.rdr-rhop.sslip.io"
I0506 19:59:10.299575       1 toomanyrestarts.go:90] "Processing node" node="master-2.rdr-rhop.sslip.io"
I0506 19:59:10.300385       1 toomanyrestarts.go:90] "Processing node" node="worker-0.rdr-rhop.sslip.io"
I0506 19:59:10.300701       1 toomanyrestarts.go:90] "Processing node" node="worker-1.rdr-rhop.sslip.io"
I0506 19:59:10.301097       1 descheduler.go:287] "Number of evicted pods" totalEvicted=5

This article shows a simple case for the Descheduler and you can see how it ran a dry run and showed it would evict five pods.

2022-05-06

Operator Training – Part 1: Concepts and Why Use Go

A brief Operator training I gave to my team resulted in these notes. Thanks to many others in the reference section.

An Operator codifies the tasks commonly associated with administrating, operating, and supporting an application. The codified tasks are event-driven responses to changes (create-update-delete-time) in the declared state relative to the actual state of an application, using domain knowledge to reconcile the state and report on the status.

Operators are used to execute basic and advanced operations:

Basic (Helm, Go, Ansible)

Installation and Configuration
Uninstall and Destroy
Seamless Upgrades

Advanced (Go, Ansible)

Application Lifecycle (Backup, Failure Recovery)
Monitoring, Metrics, Alerts, Log Processing, Workload Analysis
Auto-scaling: Horizontal and Vertical
Event (Anomaly) Detection and Response (Remediation)
Scheduling and Tuning
Application Specific Management
Continuous Testing and Chaos Monkey

Helm operators wrap helm charts in a simplistic view of the operation pass-through helm verbs, so one can install, uninstall, destroy, and upgrade using an Operator.

There are four actors in the Operator Pattern.

Initiator – The user who creates the Custom Resource
Operator – The Controller that operates on the Operand
Operand – The target application
OpenShift and Kubernetes Environment

Each Operator operates on an Operand using Managed Resources (Kubernetes and OpenShift) to reconcile states. The states are described in a domain specific language (DSL) encapsulated in a Custom Resource to describe the state of the application:

spec – The User communicates to the Operator the desired state (Operator reads)
status – The Operator communicates back to the User (Operator writes)

$ oc get authentications cluster -o yaml
apiVersion: config.openshift.io/v1
kind: Authentication
metadata:
  annotations:
    include.release.openshift.io/ibm-cloud-managed: "true"
    include.release.openshift.io/self-managed-high-availability: "true"
    include.release.openshift.io/single-node-developer: "true"
    release.openshift.io/create-only: "true"
spec:
  oauthMetadata:
    name: ""
  serviceAccountIssuer: ""
  type: ""
  webhookTokenAuthenticator:
    kubeConfig:
      name: webhook-authentication-integrated-oauth
status:
  integratedOAuthMetadata:
    name: oauth-openshift

While not limited to writing spec and status, if we think spec is initiator specified, and if we think status is operator written, then we limit the chances of creating an unintended reconciliation loop.

The DSL is specified as Custom Resource Definition:

$ oc get crd machinehealthchecks.machine.openshift.io -o=yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
spec:
  conversion:
    strategy: None
  group: machine.openshift.io
  names:
    kind: MachineHealthCheck
    listKind: MachineHealthCheckList
    plural: machinehealthchecks
    shortNames:
    - mhc
    - mhcs
    singular: machinehealthcheck
  scope: Namespaced
    name: v1beta1
    schema:
      openAPIV3Schema:
        description: 'MachineHealthCheck'
        properties:
          apiVersion:
            description: 'APIVersion defines the versioned schema of this representation'
            type: string
          kind:
            description: 'Kind is a string value representing the REST resource'
            type: string
          metadata:
            type: object
          spec:
            description: Specification of machine health check policy
            properties:
              expectedMachines:
                description: total number of machines counted by this machine health
                  check
                minimum: 0
                type: integer
              unhealthyConditions:
                description: UnhealthyConditions contains a list of the conditions.
                items:
                  description: UnhealthyCondition represents a Node.
                  properties:
                    status:
                      minLength: 1
                      type: string
                    timeout:
                      description: Expects an unsigned duration string of decimal
                        numbers each with optional fraction and a unit suffix, eg
                        "300ms", "1.5h" or "2h45m". Valid time units are "ns", "us"
                        (or "µs"), "ms", "s", "m", "h".
                      pattern: ^([0-9]+(\.[0-9]+)?(ns|us|µs|ms|s|m|h))+$
                      type: string
                    type:
                      minLength: 1
                      type: string
                  type: object
                minItems: 1
                type: array
            type: object

For example, these operators manage the applications by orchestrating operations based on changes to the CustomResource (DSL):

Operator Type/Language	What it does	Operations
cluster-etcd-operator go	Manages etcd in OpenShift	Install Monitor Manage
prometheus-operator go	Manages Prometheus monitoring on a Kubernetes cluster	Install Monitor Manage Configure
cluster-authentication-operator go	Manages OpenShift Authentication	Manage Observe

As a developer, we’re going to follow a common development pattern:

Implement the Operator Logic (Reconcile the operational state)
Bake Container Image
Create or regenerate Custom Resource Definition (CRD)
Create or regenerate Role-based Access Control (RBAC)
1. Role
1. RoleBinding
Apply Operator YAML

Note, we’re not necessarily writing business logic, rather operational logic.

There are some best practices we follow:

Develop one operator per application
1. One CRD per Controller. Created and Fit for Purpose. Less Contention.
1. No Cross Dependencies.
Use Kubernetes Primitives when Possible
Be Backwards Compatible
Compartmentalize features via multiple controllers
1. Scale = one controller
1. Backup = one controller
Use asynchronous metaphors with the synchronous reconciliation loop
1. Error, then immediate return, backoff and check later
1. Use concurrency to split the processing / state
Prune Kubernetes Resources when not used
Apps Run when Operators are stopped
Document what the operator does and how it does it
Install in a single command

We use the Operator SDK – one it’s supported by Red Hat and the CNCF.

operator-sdk: Which one? Ansible and Go

Kubernetes is authored in the Go language. Currently, OpenShift uses Go 1.17 and most operators are implemented in Go. The community has built many go-based operators, we have much more support on StackOverflow and a forum.

	Ansible	Go
Kubernetes Support	Cached Clients	Solid, Complete and Rich Kubernetes Client
Language Type	Declarative – describe the end state	Imperative – describe how to get to the end state
Operator Type	Indirect Wrapped in the Ansible-Operator	Direct
Style	Systems Administration	Systems Programming
Performance	Link	~4M at startup Single layer scratch image
Security	Expanded Surface Area	Limited Surface Area

Go is ideal for concurrency, strong memory management, everything is baked into the executable deliverable – it’s in memory and ready-to-go. There are lots of alternatives to code NodeJS, Rust, Java, C#, Python. The OpenShift Operators are not necessarily built on the Operator SDK.

Summary

We’ve run through a lot of detail on Operators and learned why we should go with Go operators.

Reference

CNCF Operator White Paper https://github.com/cncf/tag-app-delivery/blob/main/operator-wg/whitepaper/Operator-WhitePaper_v1-0.md
Operator pattern https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
Operator SDK Framework https://sdk.operatorframework.io/docs/overview/
Kubernetes Operators 101, Part 2: How operators work https://developers.redhat.com/articles/2021/06/22/kubernetes-operators-101-part-2-how-operators-work?source=sso#
Build Kubernetes with the Right Tool https://cloud.redhat.com/blog/build-your-kubernetes-operator-with-the-right-tool https://hazelcast.com/blog/build-your-kubernetes-operator-with-the-right-tool/
Build Your Kubernetes Operator with the Right Tool
Operator SDK Best Practices https://sdk.operatorframework.io/docs/best-practices/
Google Best practices for building Kubernetes Operators and stateful apps https://cloud.google.com/blog/products/containers-kubernetes/best-practices-for-building-kubernetes-operators-and-stateful-apps
Kubernetes Operator Patterns and Best Practises https://github.com/IBM/operator-sample-go
Fast vs Easy: Benchmarking Ansible Operators for Kubernetes https://www.ansible.com/blog/fast-vs-easy-benchmarking-ansible-operators-for-kubernetes
Debugging a Kubernetes Operator https://www.youtube.com/watch?v=8hlx6F4wLAA&t=21s
Contributing to the Image Registry Operator https://github.com/openshift/cluster-image-registry-operator/blob/master/CONTRIBUTING.md
Leszko’s OperatorCon Presentation
1. YouTube https://www.youtube.com/watch?v=hTapESrAmLc
1. GitHub Repo for Session: https://github.com/leszko/build-your-operator

2022-04-28

Proof-of-Concept: OpenShift on Power: Configuring an OpenID Connect identity provider
This document outlines the installation of the OpenShift on Power, the installation of the Red Hat Single Sign-On Operator and configuring the two to work together on OCP.

Thanks to Zhimin Wen who helped in my setup of the OIDC with his great work.

Steps
1. Setup OpenShift Container Platform (OCP) 4.x on IBM® Power Systems™ Virtual Server on IBM Cloud using the Terraform based automation code using the documentation provided. You’ll need to update var.tfvars to match your environment and PowerVS Service settings.
```
terraform init --var-file=var.tfvars
terraform apply --var-file=var.tfvars
```
1. At the end of the deployment, you see an output pointing to the Bastion Server.
```
bastion_private_ip = "192.168.*.*"
bastion_public_ip = "158.*.*.*"
bastion_ssh_command = "ssh -i data/id_rsa root@158.*.*.*"
bootstrap_ip = "192.168.*.*"
cluster_authentication_details = "Cluster authentication details are available in 158.*.*.* under ~/openstack-upi/auth"
cluster_id = "ocp-oidc-test-cb68"
install_status = "COMPLETED"
master_ips = [
  "192.168.*.*",
  "192.168.*.*",
  "192.168.*.*",
]
oc_server_url = "https://api.ocp-oidc-test-cb68.*.*.*.*.xip.io:6443"
storageclass_name = "nfs-storage-provisioner"
web_console_url = "https://console-openshift-console.apps.ocp-oidc-test-cb68.*.*.*.*.xip.io"
worker_ips = [
  "192.168.*.*",
  "192.168.*.*",
]
```
1. Add Hosts Entry
```
127.0.0.1 console-openshift-console.apps.ocp-oidc-test-cb68.*.xip.io api.ocp-oidc-test-cb68.*.xip.io oauth-openshift.apps.ocp-oidc-test-cb68.*.xip.io
```
1. Connect via SSH
```
sudo ssh -i data/id_rsa -L 5900:localhost:5901 -L443:localhost:443 -L6443:localhost:6443 -L8443:localhost:8443 root@*
```
You’re connecting on the commandline for a reason with ports forwarded since not all ports are open on the Bastion Server.
1. Find the OpenShift kubeadmin password in openstack-upi/auth/kubeadmin-password
```
cat openstack-upi/auth/kubeadmin-password
eZ2Hq-JUNK-JUNKB4-JUNKZN
```
1. From Login into the web_console_url, navigate to https://console-openshift-console.apps.ocp-oidc-test-cb68.*.xip.io/
If prompted, accept Security Warnings
1. Login with the Kubeadmin credentials when promtped
2. Click OperatorHub
3. Search for Keycloak
4. Select Red Hat Single Sign-On Operator
5. Click Install
6. On the Install Operator Screen:
  1. Select alpha channel
  2. Select namespace default (if you prefer an alternative namespace, that’s fine this is just a demo)
  3. Click Install
7. Click on Installed Operators
8. Watch rhsso-operator for a completed installation, the status should show Succeeded
9. Once ready, click on the Operator > Red Hat Single Sign-On Operator
10. Click on Keycloak, create Keycloak
11. Enter the following YAML:
```
apiVersion: keycloak.org/v1alpha1
kind: Keycloak
metadata:
  name: example-keycloak
  labels:
    app: sso
spec:
  instances: 1
  externalAccess:
    enabled: true
```
1. Once it’s deployed, click on example-keycloak > YAML. Look for status.externalURL.
```
status:
  credentialSecret: credential-example-keycloak
  externalURL: 'https://keycloak-default.apps.ocp-oidc-test-cb68.*.xip.io'
```
1. Update the /etc/hosts with
```
127.0.0.1 keycloak-default.apps.ocp-oidc-test-cb68.*.xip.io 
```
1. Click Workloads > Secrets
2. Click on credential-example-keycloak
3. Click Reveal values
```
U: admin
P: <<hidden>>
```
1. For Keycloak, login to https://keycloak-default.apps.ocp-oidc-test-cb68.*.xip.io/auth/admin/master/console/#/realms/master using the revealed secret
2. Click Add Realm
3. Enter name test.
4. Click Create
5. Click Client
6. Click Create
7. Enter ClientId – test
8. Select openid-connect
9. Click Save
10. Click Keys
11. Click Generate new keys and certificate
12. Click Settings > Access Type
13. Select confidential
14. Enter Valid Redirect URIs https://* we could set this as the OAuth url such as https://oauth-openshift.apps.ocp-oidc-test-cb68.*.xip.io/*
15. Click Credentials (Copy the Secret), such as:
```
43f4e544-fa95-JUNK-a298-JUNK
```
1. Under Generate Private Key…
  1. Select Archive Format JKS
  2. Key Password: password
  3. Store Password: password
  4. Click Generate and Download
2. On the Bastion server, create the keycloak secret
```
oc -n openshift-config create secret generic keycloak-client-secret --from-literal=clientSecret=43f4e544-fa95-JUNK-a298-JUNK
configmap "keycloak-ca" deleted
```
1. Grab the ingress CA
```
oc -n openshift-ingress-operator get secret router-ca -o jsonpath="{ .data.tls\.crt }" | base64 -d -i > ca.crt
```
1. Create the keycloak CA secret
```
oc -n openshift-config create cm keycloak-ca --from-file=ca.crt
configmap/keycloak-ca created
```
1. Create the openid Auth Provider
```
apiVersion: config.openshift.io/v1
kind: OAuth
metadata:
  name: cluster
spec:
  identityProviders:
    - name: keycloak 
      mappingMethod: claim 
      type: OpenID
      openID:
        clientID: console
        clientSecret:
          name: keycloak-client-secret
        ca:
          name: keycloak-ca
        claims: 
          preferredUsername:
          - preferred_username
          name:
          - name
          email:
          - email
        issuer: https://keycloak-default.apps.ocp-oidc-test-cb68.*.xip.io/auth/realms/test
```
1. Logout of the Kubeadmin
2. On Keycloak, Manage > Users, Click add a user with an email and password. Click Save
3. Click Credentials
4. Enter a new password and confirm
5. Turn Temporary Password off
6. Navigate to the web_console_url
7. Select the new IdP
8. Login with the new user
There is a clear support for OIDC Connect already enabled on OpenShift, and this document outlines how to test with Keycloak.

A handy link for debugging is the openid-configuration

Reference

Blog: Keycloak OIDC Identity Provider for OpenShift

Proof-of-Concept: OpenShift on Power: Configuring an OpenID Connect identity provider
2022-04-12