$ ibmcloud login --apikey "${IBMCLOUD_API_KEY}" -r ca-tor
API endpoint: https://cloud.ibm.com
Authenticating...
OK
Targeted account Demo <-> 1012
Targeted region ca-tor
Users of 'ibmcloud login --vpc-cri' need to use this API to login until July 6, 2022: https://cloud.ibm.com/apidocs/vpc-metadata#create-iam-token
API endpoint: https://cloud.ibm.com
Region: ca-tor
User: myuser@us.ibm.com
Account: Demo <-> 1012
Resource group: No resource group targeted, use 'ibmcloud target -g RESOURCE_GROUP'
CF API endpoint:
Org:
Space:
List your PowerVS services
$ ibmcloud pi sl
Listing services under account Demo as user myuser@us.ibm.com...
ID Name
crn:v1:bluemix:public:power-iaas:mon01:a/999999c1f1c29460e8c2e4bb8888888:ADE123-8232-4a75-a9d4-0e1248fa30c6:: demo-service
Target your PowerVS instance
$ ibmcloud pi st crn:v1:bluemix:public:power-iaas:mon01:a/999999c1f1c29460e8c2e4bb8888888:ADE123-8232-4a75-a9d4-0e1248fa30c6::
List the PowerVS Services’ VMs
$ ibmcloud pi ins
Listing instances under account Demo as user myuser@us.ibm.com...
ID Name Path
12345-ae8f-494b-89f3-5678 control-plane-x /pcloud/v1/cloud-instances/abc-def-ghi-jkl/pvm-instances/12345-ae8f-494b-89f3-5678
Create a Console for the VM instance you want to look at:
$ ibmcloud pi ingc control-plane-x
Getting console for instance control-plane-x under account Demo as user myuser@us.ibm.com...
Name control-plane-x
Console URL https://mon01-console.power-iaas.cloud.ibm.com/console/index.html?path=%3Ftoken%3not-real
Click on the Console URL, and view in your browser. it can be very helpful.
I was able to diagnose that I had the wrong reference image.
$ oc get nodes -l node-role.kubernetes.io/worker
NAME STATUS ROLES AGE VERSION
worker0 Ready master,worker 28h v1.23.5+3afdacb
worker1 Ready master,worker 28h v1.23.5+3afdacb
2. Launch a debug pod to the node/worker0 and execute a chroot, and curl to confirm it times out.
$ oc debug node/worker0
Starting pod/1024204-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.242.0.4
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
curl google.com -v -k
* About to connect() to google.com port 80 (#0)
* Trying 216.58.212.238...
If the curl command never completes, then you probably don’t have the VPC set for egress.
3. Navigate to https://cloud.ibm.com/vpc-ext/network/subnet/
4. Find your subnet, click on Public Gateway
5. Retry accessing your Console (You can also retry from the command line oc debug). You should now see the dashboard (note it may need to retry the CrashBackOffLoop for the pod, so it may be a few minutes).
Appendix: Checking your Console URL
If you don’t know your external console URL, you can retrieve it from oc.
$ oc -n openshift-config-managed get cm console-public -o jsonpath='{.data.consoleURL}'
https://console-openshift-console.hidden.eu-gb.containers.appdomain.cloud
Appendix: Checking Access Tokens
If you are using OauthAccessTokens in your environment, and you closed your display, you can always get a view (as a kubeadmin) of the current access tokens using the OpenShift command line.
$ oc get oauthaccesstokens -A
NAME USER NAME CLIENT NAME CREATED EXPIRES REDIRECT URI SCOPES
sha256~-m IAM#yyy@ibm.com openshift-browser-client 12m 2022-06-22 15:00:38 +0000 UTC https://hiddene.eu-gb.containers.cloud.ibm.com:31871/oauth/token/display user:full
sha256~x IAM#g@ibm.com openshift-browser-client 10m 2022-06-22 15:02:24 +0000 UTC https://hiddene.eu-gb.containers.cloud.ibm.com:31871/oauth/token/display user:full
sha256~z IAM#x@us.ibm.com openshift-browser-client 171m 2022-06-22 12:21:30 +0000 UTC https://hiddene.eu-gb.containers.cloud.ibm.com:31871/oauth/token/display user:full
sha256~z IAM#y@ibm.com openshift-browser-client 131m 2022-06-22 13:01:18 +0000 UTC https://hiddene.eu-gb.containers.cloud.ibm.com:31871/oauth/token/display user:full
sha256~y IAM#y@ibm.com openshift-browser-client 84m 2022-06-22 13:48:29 +0000 UTC https://hiddene.eu-gb.containers.cloud.ibm.com:31871/oauth/token/display user:full
sha256~x IAM#y@ibm.com openshift-browser-client 130m 2022-06-22 13:02:25 +0000 UTC https://hiddene.eu-gb.containers.cloud.ibm.com:31871/oauth/token/display user:full
Appendix: Checking the OAuth Well Known
To check the well known oauth endpoints, check https://hidden-e.eu-gb.containers.cloud.ibm.com:30603/.well-known/oauth-authorization-server
In OpenShift, the kube-scheduler binds a unit of work (Pod) to a Node. The scheduler reads from a scheduling queue the work, retrieves the current state of the cluster, scores the work based on the scheduling rules (from the policy) and the cluster’s state, and prioritizes binding the Pod to a Node.
These nodes are scheduled based on an instantaneous read of the policy and the environment and a best-estimation placement of the Pod on a Node. With best estimate at the time, these clusters are constantly changing shape and context; there is a need to deschedule and schedule the Pod anew.
Descheduler run on a set interval and re-evaluates the scheduled Pod and Node and Policy, setting an eviction if the Pod should be removed based on the Descheduler Policy.
Pod is removed (unbound).
Thankfully, OpenShift has a Descheduler Operator that more easily facilitates the unbinding of a Pod from a Node based on a cluster-wide configuration of the KubeDeschedulerCustomResource. In a single cluster, there is at most one configured KubeDescheduler named cluster (it has to be fixed), and configures one or more Descheduler Profiles.
Spreads pods evenly among nodes based on topology constraints and duplicate replicates on the same node The profile cannot be used with SoftTopologyAndDuplicates.
SoftTopologyAndDuplicates
Spreads pods with prior with soft constraints The profile cannot be used with TopologyAndDuplicates.
LifecycleAndUtilization
Balances pods based on node resource usage This profile cannot be used with DevPreviewLongLifecycle
EvictPodsWithLocalStorage
Enables pods with local storage to be evicted by the descheduler by all other profiles
EvictPodsWithPVC
Prevents pods with PVCs from being evicted by all other profiles
DevPreviewLongLifecycle
Lifecycle management for pods that are ‘long running’ This profile cannot be used with LifecycleAndUtilization
There must be one or more DeschedulerProfile specified, and there cannot be any duplicates entries. There are two possible mode values – Automatic and Predictive. You have to go the Pod to check the output to see what is Predicted or is Completed.
The DeschedulerOperator excludes the openshift-*, kube-system and hypershift namespaces.
2. Create a Pod that indicates it’s available for eviction using the annotation descheduler.alpha.kubernetes.io/evict: “true” and is updated for the proper node name.
4. Get the Pods in the openshift-kube-descheduler-operator
oc get pods -n openshift-kube-descheduler-operator
NAME READY STATUS RESTARTS AGE
descheduler-f479c5669-5ffxl 1/1 Running 0 2m7s
descheduler-operator-85fc6666cb-5dfr7 1/1 Running 0 27h
5. Check the Logs for the descheduler pod
oc -n openshift-kube-descheduler-operator logs descheduler-f479c5669-5ffxl
I0506 19:59:10.298440 1 pod_lifetime.go:110] "Evicted pod because it exceeded its lifetime" pod="minio-operator/console-7bc65f7dd9-q57lr" maxPodLifeTime=60
I0506 19:59:10.298500 1 evictions.go:158] "Evicted pod in dry run mode" pod="default/demopod1" reason="PodLifeTime"
I0506 19:59:10.298532 1 pod_lifetime.go:110] "Evicted pod because it exceeded its lifetime" pod="default/demopod1" maxPodLifeTime=60
I0506 19:59:10.298598 1 toomanyrestarts.go:90] "Processing node" node="master-0.rdr-rhop-.sslip.io"
I0506 19:59:10.299118 1 toomanyrestarts.go:90] "Processing node" node="master-1.rdr-rhop.sslip.io"
I0506 19:59:10.299575 1 toomanyrestarts.go:90] "Processing node" node="master-2.rdr-rhop.sslip.io"
I0506 19:59:10.300385 1 toomanyrestarts.go:90] "Processing node" node="worker-0.rdr-rhop.sslip.io"
I0506 19:59:10.300701 1 toomanyrestarts.go:90] "Processing node" node="worker-1.rdr-rhop.sslip.io"
I0506 19:59:10.301097 1 descheduler.go:287] "Number of evicted pods" totalEvicted=5
This article shows a simple case for the Descheduler and you can see how it ran a dry run and showed it would evict five pods.
A brief Operator training I gave to my team resulted in these notes. Thanks to many others in the reference section.
An Operator codifies the tasks commonly associated with administrating, operating, and supporting an application. The codified tasks are event-driven responses to changes (create-update-delete-time) in the declared state relative to the actual state of an application, using domain knowledge to reconcile the state and report on the status.
Event (Anomaly) Detection and Response (Remediation)
Scheduling and Tuning
Application Specific Management
Continuous Testing and Chaos Monkey
Helm operators wrap helm charts in a simplistic view of the operation pass-through helm verbs, so one can install, uninstall, destroy, and upgrade using an Operator.
There are four actors in the Operator Pattern.
Initiator – The user who creates the Custom Resource
Operator – The Controller that operates on the Operand
Each Operator operates on an Operand using Managed Resources (Kubernetes and OpenShift) to reconcile states. The states are described in a domain specific language (DSL) encapsulated in a Custom Resource to describe the state of the application:
spec – The User communicates to the Operator the desired state (Operator reads)
status – The Operator communicates back to the User (Operator writes)
While not limited to writing spec and status, if we think spec is initiator specified, and if we think status is operator written, then we limit the chances of creating an unintended reconciliation loop.
The DSL is specified as Custom Resource Definition:
$ oc get crd machinehealthchecks.machine.openshift.io -o=yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
spec:
conversion:
strategy: None
group: machine.openshift.io
names:
kind: MachineHealthCheck
listKind: MachineHealthCheckList
plural: machinehealthchecks
shortNames:
- mhc
- mhcs
singular: machinehealthcheck
scope: Namespaced
name: v1beta1
schema:
openAPIV3Schema:
description: 'MachineHealthCheck'
properties:
apiVersion:
description: 'APIVersion defines the versioned schema of this representation'
type: string
kind:
description: 'Kind is a string value representing the REST resource'
type: string
metadata:
type: object
spec:
description: Specification of machine health check policy
properties:
expectedMachines:
description: total number of machines counted by this machine health
check
minimum: 0
type: integer
unhealthyConditions:
description: UnhealthyConditions contains a list of the conditions.
items:
description: UnhealthyCondition represents a Node.
properties:
status:
minLength: 1
type: string
timeout:
description: Expects an unsigned duration string of decimal
numbers each with optional fraction and a unit suffix, eg
"300ms", "1.5h" or "2h45m". Valid time units are "ns", "us"
(or "µs"), "ms", "s", "m", "h".
pattern: ^([0-9]+(\.[0-9]+)?(ns|us|µs|ms|s|m|h))+$
type: string
type:
minLength: 1
type: string
type: object
minItems: 1
type: array
type: object
For example, these operators manage the applications by orchestrating operations based on changes to the CustomResource (DSL):
As a developer, we’re going to follow a common development pattern:
Implement the Operator Logic (Reconcile the operational state)
Bake Container Image
Create or regenerate Custom Resource Definition (CRD)
Create or regenerate Role-based Access Control (RBAC)
Role
RoleBinding
Apply Operator YAML
Note, we’re not necessarily writing business logic, rather operational logic.
There are some best practices we follow:
Develop one operator per application
One CRD per Controller. Created and Fit for Purpose. Less Contention.
No Cross Dependencies.
Use Kubernetes Primitives when Possible
Be Backwards Compatible
Compartmentalize features via multiple controllers
Scale = one controller
Backup = one controller
Use asynchronous metaphors with the synchronous reconciliation loop
Error, then immediate return, backoff and check later
Use concurrency to split the processing / state
Prune Kubernetes Resources when not used
Apps Run when Operators are stopped
Document what the operator does and how it does it
Install in a single command
We use the Operator SDK – one it’s supported by Red Hat and the CNCF.
operator-sdk: Which one? Ansible and Go
Kubernetes is authored in the Go language. Currently, OpenShift uses Go 1.17 and most operators are implemented in Go. The community has built many go-based operators, we have much more support on StackOverflow and a forum.
Go is ideal for concurrency, strong memory management, everything is baked into the executable deliverable – it’s in memory and ready-to-go. There are lots of alternatives to code NodeJS, Rust, Java, C#, Python. The OpenShift Operators are not necessarily built on the Operator SDK.
Summary
We’ve run through a lot of detail on Operators and learned why we should go with Go operators.
I built a demonstration using GoLang, JSON, bcrypt, http client, http server to model an actual IDP. This is a demonstration only; it really helped me setup/understand what’s happening in the RequestHeader.
This document outlines the flow using the haproxy and Apache Httpd already installed on the Bastion server as part of the installation process and a local Go Test IdP to demonstrate the feature.
The rough flow between OpenShift, the User and the Test IdP is:
For those managing OpenShift clusters, the oc tool manages all the OpenShift resources with handy commands for OpenShift and Kubernetes. The OpenShift Client CLI (oc) project is built on top of kubectl adding built-in features to simplify interactions with an OpenShift cluster.
Much like the kubectl, the oc cli tool provides a feature to Extend the OpenShift CLI with plug-ins. The oc plugins feature is a client-side feature to faciliate interactions with extensions commands; found in the current user’s path. There is an ecosystem of plugins through the community and the Krew Plugin List.
k9s is a terminal based UI to interact with your Kubernetes clusters.
sample-cli-plugin which is a simple example to show how to switch namespaces in k8s. I’m not entirely certain that this works with OpenShift.
These plugins have a wide range of support and code. Some of the plugins are based on python, others are based on go and bash.
oc expands the plugin search path pkg/cli/kubectlwrappers/wrappers.go in plugin.ValidPluginFilenamePrefixes = []string{"oc", "kubectl"} so whole new OpenShift specific plugins are supported. The OpenShift team has also released a number of plugins:
oc-mirror manages OpenShift release, operator catalog, helm charts, and associated container images for mirror registries that support OpenShift environments
oc-compliance facilitates using the OpenShift Compliance operator.
Many of these extensions/plugins are installed using krew; krew is a plugin manager for kubectl. Some users create a directory .kube/plugins and install their plugins in that folder. The plugins folder is then added to the user’s path.
Creating your own Extension
Check to see if any plugins exist:
$ oc plugin list
The following compatible plugins are available:
/Users/user/.kube/plugins/oc-test
If none exist, it’ll prompt you that none are found in the path, and you can install from krew.
To quote the Kubernetes website, “The Operator pattern captures how you can write code to automate a task beyond what Kubernetes itself provides.” The following is an compendium to use while Learning Operators.
The defacto SDK to use is the Operator SDK which provides HELM, Ansible and GO scaffolding to support your implementation of the Operator pattern.
The following are education classes on the OperatorSDK
When Running through the CO0201EN intermediate operators course, I did hit the case where I had to create a ClusterRole and ClusterRoleBinding for the ServiceAccount, here is a snippet that might helper others:
Create OpenShift Plugins – You must have a CLI plug-in file that begins with oc- or kubectl-. You create a file and put it in /usr/local/bin/
Details on running Code Ready Containers on Linux – The key hack I learned awas to ssh -i ~/.crc/machines/crc/id_ecdsa core@<any host in the /etc/hosts>
I ran on VirtualBox Ubuntu 20.04 with Guest Additions Installed
Virtual Box Settings for the Machine – 6 CPU, 18G
System > Processor > Enable PAE/NX and Enable Nested VT-X/AMD-V (which is a must for it to work)
Network > Change Adapter Type to virtio-net and Set Promiscuous Mode to Allow VMS
Install openssh-server so you can login remotely
It will not install without a windowing system, so I have the default windowing environment installed.
Note, I still get a failure on startup complaining about a timeout. I waited about 15 minutes post this, and the command oc get nodes –context admin –cluster crc –kubeconfig .crc/cache/crc_libvirt_4.10.3_amd64/kubeconfig now works.
I had to watch 19 hours of slow paced videos for a training on a new software product (at least new to me). I like fast paced trainings… enter a browser hack.
In Firefox, Navigate to Tools > Browser Tools > Web Developer Tools
Click Console
Type the following snippet to find the first video on a page, and change the playback rate, and Click Enter.
Note, 4.0 can be unintelligible, you’ll need to tweak the speed to match what you need. I found 2.5 to 3.0 to be very comfortable (you just can’t multitask).