Tag: openshift

  • Operator Training – Part 1: Concepts and Why Use Go

    A brief Operator training I gave to my team resulted in these notes. Thanks to many others in the reference section.

    An Operator codifies the tasks commonly associated with administrating, operating, and supporting an application.  The codified tasks are event-driven responses to changes (create-update-delete-time) in the declared state relative to the actual state of an application, using domain knowledge to reconcile the state and report on the status.

    Figure 1 Operator Pattern

    Operators are used to execute basic and advanced operations:

    Basic (Helm, Go, Ansible)

    1. Installation and Configuration
    2. Uninstall and Destroy
    3. Seamless Upgrades

    Advanced (Go, Ansible)

    1. Application Lifecycle (Backup, Failure Recovery)
    2. Monitoring, Metrics, Alerts, Log Processing, Workload Analysis
    3. Auto-scaling: Horizontal and Vertical
    4. Event (Anomaly) Detection and Response (Remediation)
    5. Scheduling and Tuning
    6. Application Specific Management
    7. Continuous Testing and Chaos Monkey

    Helm operators wrap helm charts in a simplistic view of the operation pass-through helm verbs, so one can install, uninstall, destroy, and upgrade using an Operator.

    There are four actors in the Operator Pattern.

    1. Initiator – The user who creates the Custom Resource
    2. Operator – The Controller that operates on the Operand
    3. Operand – The target application
    4. OpenShift and Kubernetes Environment
    Figure 2 Common Terms

    Each Operator operates on an Operand using Managed Resources (Kubernetes and OpenShift) to reconcile states.  The states are described in a domain specific language (DSL) encapsulated in a Custom Resource to describe the state of the application:

    1. spec – The User communicates to the Operator the desired state (Operator reads)
    2. status – The Operator communicates back to the User (Operator writes)
    $ oc get authentications cluster -o yaml
    apiVersion: config.openshift.io/v1
    kind: Authentication
    metadata:
      annotations:
        include.release.openshift.io/ibm-cloud-managed: "true"
        include.release.openshift.io/self-managed-high-availability: "true"
        include.release.openshift.io/single-node-developer: "true"
        release.openshift.io/create-only: "true"
    spec:
      oauthMetadata:
        name: ""
      serviceAccountIssuer: ""
      type: ""
      webhookTokenAuthenticator:
        kubeConfig:
          name: webhook-authentication-integrated-oauth
    status:
      integratedOAuthMetadata:
        name: oauth-openshift

    While not limited to writing spec and status, if we think spec is initiator specified, and if we think status is operator written, then we limit the chances of creating an unintended reconciliation loop.

    The DSL is specified as Custom Resource Definition:

    $ oc get crd machinehealthchecks.machine.openshift.io -o=yaml
    apiVersion: apiextensions.k8s.io/v1
    kind: CustomResourceDefinition
    spec:
      conversion:
        strategy: None
      group: machine.openshift.io
      names:
        kind: MachineHealthCheck
        listKind: MachineHealthCheckList
        plural: machinehealthchecks
        shortNames:
        - mhc
        - mhcs
        singular: machinehealthcheck
      scope: Namespaced
        name: v1beta1
        schema:
          openAPIV3Schema:
            description: 'MachineHealthCheck'
            properties:
              apiVersion:
                description: 'APIVersion defines the versioned schema of this representation'
                type: string
              kind:
                description: 'Kind is a string value representing the REST resource'
                type: string
              metadata:
                type: object
              spec:
                description: Specification of machine health check policy
                properties:
                  expectedMachines:
                    description: total number of machines counted by this machine health
                      check
                    minimum: 0
                    type: integer
                  unhealthyConditions:
                    description: UnhealthyConditions contains a list of the conditions.
                    items:
                      description: UnhealthyCondition represents a Node.
                      properties:
                        status:
                          minLength: 1
                          type: string
                        timeout:
                          description: Expects an unsigned duration string of decimal
                            numbers each with optional fraction and a unit suffix, eg
                            "300ms", "1.5h" or "2h45m". Valid time units are "ns", "us"
                            (or "µs"), "ms", "s", "m", "h".
                          pattern: ^([0-9]+(\.[0-9]+)?(ns|us|µs|ms|s|m|h))+$
                          type: string
                        type:
                          minLength: 1
                          type: string
                      type: object
                    minItems: 1
                    type: array
                type: object

    For example, these operators manage the applications by orchestrating operations based on changes to the CustomResource (DSL):

    Operator Type/LanguageWhat it doesOperations
    cluster-etcd-operator goManages etcd in OpenShiftInstall Monitor Manage
    prometheus-operator goManages Prometheus monitoring on a Kubernetes clusterInstall Monitor Manage Configure
    cluster-authentication-operator goManages OpenShift AuthenticationManage Observe

    As a developer, we’re going to follow a common development pattern:

    1. Implement the Operator Logic (Reconcile the operational state)
    2. Bake Container Image
    3. Create or regenerate Custom Resource Definition (CRD)
    4. Create or regenerate Role-based Access Control (RBAC)
      1. Role
      1. RoleBinding
    5. Apply Operator YAML

    Note, we’re not necessarily writing business logic, rather operational logic.

    There are some best practices we follow:

    1. Develop one operator per application
      1. One CRD per Controller. Created and Fit for Purpose. Less Contention.
      1. No Cross Dependencies.
    2. Use Kubernetes Primitives when Possible
    3. Be Backwards Compatible
    4. Compartmentalize features via multiple controllers
      1. Scale = one controller
      1. Backup = one controller
    5. Use asynchronous metaphors with the synchronous reconciliation loop
      1. Error, then immediate return, backoff and check later
      1. Use concurrency to split the processing / state
    6. Prune Kubernetes Resources when not used
    7. Apps Run when Operators are stopped
    8. Document what the operator does and how it does it
    9. Install in a single command

    We use the Operator SDK – one it’s supported by Red Hat and the CNCF.

    operator-sdk: Which one? Ansible and Go

    Kubernetes is authored in the Go language. Currently, OpenShift uses Go 1.17 and most operators are implemented in Go. The community has built many go-based operators, we have much more support on StackOverflow and a forum.

     AnsibleGo
    Kubernetes SupportCached ClientsSolid, Complete and Rich Kubernetes Client
    Language TypeDeclarative – describe the end stateImperative – describe how to get to the end state
    Operator TypeIndirect Wrapped in the Ansible-OperatorDirect
    StyleSystems AdministrationSystems Programming
    PerformanceLink~4M at startup Single layer scratch image
    SecurityExpanded Surface AreaLimited Surface Area

    Go is ideal for concurrency, strong memory management, everything is baked into the executable deliverable – it’s in memory and ready-to-go. There are lots of alternatives to code NodeJS, Rust, Java, C#, Python. The OpenShift Operators are not necessarily built on the Operator SDK.

    Summary

    We’ve run through a lot of detail on Operators and learned why we should go with Go operators.

    Reference

    1. CNCF Operator White Paper https://github.com/cncf/tag-app-delivery/blob/main/operator-wg/whitepaper/Operator-WhitePaper_v1-0.md
    2. Operator pattern https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
    3. Operator SDK Framework https://sdk.operatorframework.io/docs/overview/
    4. Kubernetes Operators 101, Part 2: How operators work https://developers.redhat.com/articles/2021/06/22/kubernetes-operators-101-part-2-how-operators-work?source=sso#
    5. Build Kubernetes with the Right Tool https://cloud.redhat.com/blog/build-your-kubernetes-operator-with-the-right-tool https://hazelcast.com/blog/build-your-kubernetes-operator-with-the-right-tool/
    6. Build Your Kubernetes Operator with the Right Tool
    7. Operator SDK Best Practices https://sdk.operatorframework.io/docs/best-practices/
    8. Google Best practices for building Kubernetes Operators and stateful apps https://cloud.google.com/blog/products/containers-kubernetes/best-practices-for-building-kubernetes-operators-and-stateful-apps
    9. Kubernetes Operator Patterns and Best Practises https://github.com/IBM/operator-sample-go
    10. Fast vs Easy: Benchmarking Ansible Operators for Kubernetes https://www.ansible.com/blog/fast-vs-easy-benchmarking-ansible-operators-for-kubernetes
    11. Debugging a Kubernetes Operator https://www.youtube.com/watch?v=8hlx6F4wLAA&t=21s
    12. Contributing to the Image Registry Operator https://github.com/openshift/cluster-image-registry-operator/blob/master/CONTRIBUTING.md
    13. Leszko’s OperatorCon Presentation
      1. YouTube https://www.youtube.com/watch?v=hTapESrAmLc
      1. GitHub Repo for Session: https://github.com/leszko/build-your-operator
  • Proof-of-Concept: OpenShift on Power: Configuring an OpenID Connect identity provider

    This document outlines the installation of the OpenShift on Power, the installation of the Red Hat Single Sign-On Operator and configuring the two to work together on OCP.

    Thanks to Zhimin Wen who helped in my setup of the OIDC with his great work.

    Steps

    1. Setup OpenShift Container Platform (OCP) 4.x on IBM® Power Systems™ Virtual Server on IBM Cloud using the Terraform based automation code using the documentation provided. You’ll need to update var.tfvars to match your environment and PowerVS Service settings.
    terraform init --var-file=var.tfvars
    terraform apply --var-file=var.tfvars
    
    1. At the end of the deployment, you see an output pointing to the Bastion Server.
    bastion_private_ip = "192.168.*.*"
    bastion_public_ip = "158.*.*.*"
    bastion_ssh_command = "ssh -i data/id_rsa root@158.*.*.*"
    bootstrap_ip = "192.168.*.*"
    cluster_authentication_details = "Cluster authentication details are available in 158.*.*.* under ~/openstack-upi/auth"
    cluster_id = "ocp-oidc-test-cb68"
    install_status = "COMPLETED"
    master_ips = [
      "192.168.*.*",
      "192.168.*.*",
      "192.168.*.*",
    ]
    oc_server_url = "https://api.ocp-oidc-test-cb68.*.*.*.*.xip.io:6443"
    storageclass_name = "nfs-storage-provisioner"
    web_console_url = "https://console-openshift-console.apps.ocp-oidc-test-cb68.*.*.*.*.xip.io"
    worker_ips = [
      "192.168.*.*",
      "192.168.*.*",
    ]
    
    1. Add Hosts Entry
    127.0.0.1 console-openshift-console.apps.ocp-oidc-test-cb68.*.xip.io api.ocp-oidc-test-cb68.*.xip.io oauth-openshift.apps.ocp-oidc-test-cb68.*.xip.io
    
    1. Connect via SSH
    sudo ssh -i data/id_rsa -L 5900:localhost:5901 -L443:localhost:443 -L6443:localhost:6443 -L8443:localhost:8443 root@*
    

    You’re connecting on the commandline for a reason with ports forwarded since not all ports are open on the Bastion Server.

    1. Find the OpenShift kubeadmin password in openstack-upi/auth/kubeadmin-password
    cat openstack-upi/auth/kubeadmin-password
    eZ2Hq-JUNK-JUNKB4-JUNKZN
    
    1. From Login into the web_console_url, navigate to https://console-openshift-console.apps.ocp-oidc-test-cb68.*.xip.io/

    If prompted, accept Security Warnings

    1. Login with the Kubeadmin credentials when promtped
    2. Click OperatorHub
    3. Search for Keycloak
    4. Select Red Hat Single Sign-On Operator
    5. Click Install
    6. On the Install Operator Screen:
      1. Select alpha channel
      2. Select namespace default (if you prefer an alternative namespace, that’s fine this is just a demo)
      3. Click Install
    7. Click on Installed Operators
    8. Watch rhsso-operator for a completed installation, the status should show Succeeded
    9. Once ready, click on the Operator > Red Hat Single Sign-On Operator
    10. Click on Keycloak, create Keycloak
    11. Enter the following YAML:
    apiVersion: keycloak.org/v1alpha1
    kind: Keycloak
    metadata:
      name: example-keycloak
      labels:
        app: sso
    spec:
      instances: 1
      externalAccess:
        enabled: true
    
    1. Once it’s deployed, click on example-keycloak > YAML. Look for status.externalURL.
    status:
      credentialSecret: credential-example-keycloak
      externalURL: 'https://keycloak-default.apps.ocp-oidc-test-cb68.*.xip.io'
    
    1. Update the /etc/hosts with
    127.0.0.1 keycloak-default.apps.ocp-oidc-test-cb68.*.xip.io 
    
    1. Click Workloads > Secrets
    2. Click on credential-example-keycloak
    3. Click Reveal values
    U: admin
    P: <<hidden>>
    
    1. For Keycloak, login to https://keycloak-default.apps.ocp-oidc-test-cb68.*.xip.io/auth/admin/master/console/#/realms/master using the revealed secret
    2. Click Add Realm
    3. Enter name test.
    4. Click Create
    5. Click Client
    6. Click Create
    7. Enter ClientId – test
    8. Select openid-connect
    9. Click Save
    10. Click Keys
    11. Click Generate new keys and certificate
    12. Click Settings > Access Type
    13. Select confidential
    14. Enter Valid Redirect URIs https://* we could set this as the OAuth url such as https://oauth-openshift.apps.ocp-oidc-test-cb68.*.xip.io/*
    15. Click Credentials (Copy the Secret), such as:
    43f4e544-fa95-JUNK-a298-JUNK
    
    1. Under Generate Private Key…
      1. Select Archive Format JKS
      2. Key Password: password
      3. Store Password: password
      4. Click Generate and Download
    2. On the Bastion server, create the keycloak secret
    oc -n openshift-config create secret generic keycloak-client-secret --from-literal=clientSecret=43f4e544-fa95-JUNK-a298-JUNK
    configmap "keycloak-ca" deleted
    
    1. Grab the ingress CA
    oc -n openshift-ingress-operator get secret router-ca -o jsonpath="{ .data.tls\.crt }" | base64 -d -i > ca.crt
    
    1. Create the keycloak CA secret
    oc -n openshift-config create cm keycloak-ca --from-file=ca.crt
    configmap/keycloak-ca created
    
    1. Create the openid Auth Provider
    apiVersion: config.openshift.io/v1
    kind: OAuth
    metadata:
      name: cluster
    spec:
      identityProviders:
        - name: keycloak 
          mappingMethod: claim 
          type: OpenID
          openID:
            clientID: console
            clientSecret:
              name: keycloak-client-secret
            ca:
              name: keycloak-ca
            claims: 
              preferredUsername:
              - preferred_username
              name:
              - name
              email:
              - email
            issuer: https://keycloak-default.apps.ocp-oidc-test-cb68.*.xip.io/auth/realms/test
    
    1. Logout of the Kubeadmin
    2. On Keycloak, Manage > Users, Click add a user with an email and password. Click Save
    3. Click Credentials
    4. Enter a new password and confirm
    5. Turn Temporary Password off
    6. Navigate to the web_console_url
    7. Select the new IdP
    8. Login with the new user

    There is a clear support for OIDC Connect already enabled on OpenShift, and this document outlines how to test with Keycloak.

    A handy link for debugging is the openid-configuration

    Reference

    Blog: Keycloak OIDC Identity Provider for OpenShift

    Proof-of-Concept: OpenShift on Power: Configuring an OpenID Connect identity provider

  • OpenShift RequestHeader Identity Provider with a Test IdP: My GoLang Test

    I built a demonstration using GoLang, JSON, bcrypt, http client, http server to model an actual IDP. This is a demonstration only; it really helped me setup/understand what’s happening in the RequestHeader.

    OpenShift 4.10: Configuring a request header identity provider enables an external service to act as an identity provider where a X-Remote-User header to identify the user’s identity.

    This document outlines the flow using the haproxy and Apache Httpd already installed on the Bastion server as part of the installation process and a local Go Test IdP to demonstrate the feature.

    The rough flow between OpenShift, the User and the Test IdP is:

    My Code is available at https://github.com/prb112/openshift-auth-request-header

  • Using OpenShift Plugin for oc

    For those managing OpenShift clusters, the oc tool manages all the OpenShift resources with handy commands for OpenShift and Kubernetes. The OpenShift Client CLI (oc) project is built on top of kubectl adding built-in features to simplify interactions with an OpenShift cluster.

    Much like the kubectl, the oc cli tool provides a feature to Extend the OpenShift CLI with plug-ins. The oc plugins feature is a client-side feature to faciliate interactions with extensions commands; found in the current user’s path. There is an ecosystem of plugins through the community and the Krew Plugin List.

    These plugins include:

    1. cost accessess Kubernetes cost allocation metrics
    2. outdated displays all out-of-date images running in a Kubernetes cluster
    3. pod-lens shows pod-related resource information
    4. k9s is a terminal based UI to interact with your Kubernetes clusters.
    5. sample-cli-plugin which is a simple example to show how to switch namespaces in k8s. I’m not entirely certain that this works with OpenShift.

    These plugins have a wide range of support and code. Some of the plugins are based on python, others are based on go and bash.

    oc expands the plugin search path pkg/cli/kubectlwrappers/wrappers.go in plugin.ValidPluginFilenamePrefixes = []string{"oc", "kubectl"} so whole new OpenShift specific plugins are supported. The OpenShift team has also released a number of plugins:

    1. oc-mirror manages OpenShift release, operator catalog, helm charts, and associated container images for mirror registries that support OpenShift environments
    2. oc-compliance facilitates using the OpenShift Compliance operator.

    Many of these extensions/plugins are installed using krew; krew is a plugin manager for kubectl. Some users create a directory .kube/plugins and install their plugins in that folder. The plugins folder is then added to the user’s path.

    Creating your own Extension

    1. Check to see if any plugins exist:
    $ oc plugin list
    The following compatible plugins are available:
    
    /Users/user/.kube/plugins/oc-test
    

    If none exist, it’ll prompt you that none are found in the path, and you can install from krew.

    1. Create a new file oc-test
    #! /usr/bin/env bash
    
    echo "Execution Time: $(date)"
    
    echo ""
    ps -Sf
    echo ""
    
    echo "Arguments: $@"
    
    echo "Environment Variables: "
    env
    echo ""
    
    oc version --client
    
    1. Add the file to the path.
    export PATH=~/.kube/plugins:$PATH
    
    1. Execute the oc plugin test (note the oc is stripped off)
    Execution Time: Wed Mar 30 11:22:19 EDT 2022
    
      UID   PID  PPID   C STIME   TTY           TIME CMD
      501  3239  3232   0 15Mar22 ttys000    0:01.39 -zsh
      501 80267  3239   0 17Mar22 ttys000    0:00.03 tmux
      501 54273 11494   0 Tue10AM ttys001    0:00.90 /bin/zsh -l
      501 80319 80269   0 17Mar22 ttys002    0:00.30 -zsh
      501  2430  2429   0 15Mar22 ttys003    0:03.17 -zsh
      501 78925  2430   0 11:22AM ttys003    0:00.09 bash /Users/user/.kube/plugins/oc-test test
      501 80353 80269   0 17Mar22 ttys004    0:02.07 -zsh
      501 91444 11494   0 18Mar22 ttys005    0:01.55 /bin/zsh -l
    
    Arguments: test
    Environment Variables: 
    SHELL=/bin/zsh
    TERM=xterm-256color
    ZSH=/Users/user/.oh-my-zsh
    USER=user
    PATH=/Users/user/.kube/plugins:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/go/bin
    PWD=/Users/user/Downloads
    LANG=en_US.UTF-8
    HOME=/Users/user
    LESS=-R
    LOGNAME=user
    SECURITYSESSIONID=user
    _=/usr/bin/env
    
    Client Version: 4.10.6
    

    For the above, a simple plugin demonstration is shown.

    Reference

    1. Getting started with the OpenShift CLI
    2. Extending the OpenShift CLI with plug-ins
    3. https://cloud.redhat.com/blog/augmenting-openshift-cli-with-plugins
    4. https://cloudcult.dev/tcpdump-for-openshift-workloads/
  • Learning Resources for Operators – First Two Weeks Notes

    To quote the Kubernetes website, “The Operator pattern captures how you can write code to automate a task beyond what Kubernetes itself provides.” The following is an compendium to use while Learning Operators.

    The defacto SDK to use is the Operator SDK which provides HELM, Ansible and GO scaffolding to support your implementation of the Operator pattern.

    The following are education classes on the OperatorSDK

    When Running through the CO0201EN intermediate operators course, I did hit the case where I had to create a ClusterRole and ClusterRoleBinding for the ServiceAccount, here is a snippet that might helper others:

    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      namespace: memcached-operator-system
      name: service-reader-cr-mc
    rules:
    - apiGroups: ["cache.bastide.org"] # "" indicates the core API group
      resources: ["memcacheds"]
      verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      namespace: memcached-operator-system
      name: ext-role-binding
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: service-reader-cr-mc
    subjects:
    - kind: ServiceAccount
      namespace: memcached-operator-system
      name: memcached-operator-controller-manager

    The reason for the above, I missed adding a kubebuilder declaration:

    //+kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete
    //+kubebuilder:rbac:groups=core,resources=pods,verbs=get;list;watch

    Thanks to https://stackoverflow.com/a/60334649/1873438

    The following are articles worth reviewing:

    The following are good Go resources:

    1. Go Code Comments – To write idiomatic Go, you should review the Code Review comments.
    2. Getting to Go: The Journey of Go’s Garbage Collector – The reference for Go and Garbage Collection in go
    3. An overview of memory management in Go – good overview of Go Memory Management
    4. Golang: Cost of using the heap – net 1M allocation seems to stay in the stack, outside it seems to be on the heap
    5. golangci-lint – The aggregated linters project is worthy of an installation and use. It’ll catch many issues and has a corresponding GitHub Action.
    6. Go in 3 Weeks A comprehensive training for Go. Companion to GitHub Repo
    7. Defensive Coding Guide: The Go Programming Language

    The following are good OpenShift resources:

    1. Create OpenShift Plugins – You must have a CLI plug-in file that begins with oc- or kubectl-. You create a file and put it in /usr/local/bin/
    2. Details on running Code Ready Containers on Linux – The key hack I learned awas to ssh -i ~/.crc/machines/crc/id_ecdsa core@<any host in the /etc/hosts>
      1. I ran on VirtualBox Ubuntu 20.04 with Guest Additions Installed
      2. Virtual Box Settings for the Machine – 6 CPU, 18G
        1. System > Processor > Enable PAE/NX and Enable Nested VT-X/AMD-V (which is a must for it to work)
        1. Network > Change Adapter Type to virtio-net and Set Promiscuous Mode to Allow VMS
      3. Install openssh-server so you can login remotely
      4. It will not install without a windowing system, so I have the default windowing environment installed.
      5. Note, I still get a failure on startup complaining about a timeout. I waited about 15 minutes post this, and the command oc get nodes –context admin –cluster crc –kubeconfig .crc/cache/crc_libvirt_4.10.3_amd64/kubeconfig now works.
    3. CRC virsh cheatsheet – If you are running Code Ready Containers and need to debug, you can use the virsh cheatsheet.