Blog

  • DNS Resolver Hangs with OpenVPN

    Running multiple OpenVPN on the mac, sometimes my DNS hangs and I can’t get the VPNs. I use this hack to get around it.

    ❯ sudo networksetup -setdnsservers Wi-Fi "Empty"
  • Help… My Ingress is telling me OAuthServerRouteEndpointAccessibleControllerDegraded

    My teammate hit an issue with Ingress Certificates not being valid:

    oc get co ingress -oyaml
        message: |-
          OAuthServerRouteEndpointAccessibleControllerDegraded: Get "https://oauth-openshift.apps.mycluster.local/healthz": tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2025-04-02T17:58:35Z is after 2025-02-13T20:04:16Z
          RouterCertsDegraded: secret/v4-0-config-system-router-certs.spec.data[apps.mycluster.local] -n openshift-authentication: certificate could not validate route hostname oauth-openshift.apps.mycluster.local: x509: certificate has expired or is not yet valid: current time 2025-04-02T17:58:33Z is after 2025-02-13T20:04:16Z
    

    The Red Hat docs and tech articles are great. I found How to redeploy/renew an expired default ingress certificate in RHOCP4?

    I ran the following on a non-production cluster:

    1. Renewed the ingress CA:
    oc get secret router-ca -oyaml -n openshift-ingress-operator> router-ca-2025-04-02.yaml
    oc delete secret router-ca -n openshift-ingress-operator
    oc delete pod --all -n openshift-ingress-operator
    wait 30
    oc get secret router-ca -n openshift-ingress-operator
    oc get po -n openshift-ingress-operator
    
    1. Recreate the wild-card ingress certificate using the new ingress CA:
    oc get secret router-certs-default -o yaml -n openshift-ingress > router-certs-default-2025-04-02.yaml
    oc delete secret router-certs-default -n openshift-ingress 
    oc delete pod --all -n openshift-ingress 
    wait 30
    oc get secret router-certs-default -n openshift-ingress 
    oc get po -n openshift-ingress 
    
    1. Checked the ingress
    curl -v https://oauth-openshift.apps.mycluster.local/healthz -k
    *  subject: CN=*.apps.mycluster.local
    *  start date: Apr  2 19:08:33 2025 GMT
    *  expire date: Apr  2 19:08:34 2027 GMT
    
    1. Update ca-trust
    oc -n openshift-ingress-operator get secret router-ca -o jsonpath="{ .data.tls\.crt }" | base64 -d -i > ingress-ca-2025-04-02.crt
    cp /root/ingress-ca-2025-04-02.crt /etc/pki/ca-trust/source/anchors/
    update-ca-trust 
    
    1. Login now works
    oc login -u kubeadmin -p YOUR_PASSWORD https://api.mycluster.local:6443
    

    You’ve seen how to recreate the cert.

    You should use the cert-manager operator from Red Hat.

  • Multi-Arch Tuning Operator 1.1.0 Released

    The Red Hat team has released a new version of the Multi-Arch Tuning Operator.

    In Multi-Arch Compute clusters, the Multiarch Tuning Operator influences the scheduling of Pods, so application run on the supported architecture.

    You can learn more about it at https://catalog.redhat.com/software/containers/multiarch-tuning/multiarch-tuning-operator-bundle/661659e9c5bced223a7f7244

    Addendum

    My colleague, Punith, worked with the Red Hat team to add NodeAffinityScoring and plugin support to the Multi-Arch Tuning Operator and ClusterPodPlacementConfig. This feature allows users to define cluster-wide preferences for specific architectures, influencing how the Kubernetes scheduler places pods. It helps optimize workload distribution based on preferred node architecture.

    	Spec:
    	    Plugins:
    		NodeAffinityScoring:
    		   enabled: true
    		   platforms:
    		   - architecture: ppc64le
    		     weight: 100
    		   - architecture: amd64
    		     weight: 50
  • FIPS support in Go 1.24

    Kudos to the Red Hat team. link

    The benefits of native FIPS support in Go 1.24

    The introduction of the FIPS Cryptographic Module in Go 1.24 marks a watershed moment for the language’s security capabilities. This new module provides FIPS 140-3-compliant implementations of cryptographic algorithms, seamlessly integrated into the standard library. What makes this particularly noteworthy is its transparent implementation. Existing Go applications can leverage FIPS-compliant cryptography without requiring code changes.

    Build-time configuration through the GOFIPS140 environment variable, allowing developers to select specific versions of the Go Cryptographic Module.

    GOFIPS140=true go build

    Runtime control via the fips140 GODEBUG setting, enabling dynamic FIPS mode activation.

    GODEBUG=

    Keep these in your toolbox along with GOARCH=ppc64le

  • Updates to Open Source Container images for Power on IBM Container Registry

    The IBM Linux on Power team pushed new images to their public open source container images in the IBM Container Registry (ICR). This should assure end users that IBM has authentically built these containers in a secure environment.

    The new container images are:

    Image NameTag NameProject LicensesImage Pull CommandLast Published
    fluentd-kubernetes-daemonsetv1.14.3-debian-forward-1.0Apache-2.0podman pull icr.io/ppc64le-oss/fluentd-kubernetes-daemonset:v1.14.3-debian-forward-1.0March 17, 2025
    cloudnative-pg/pgbouncer1.23.0Apache-2.0podman pull icr.io/ppc64le-oss/cloudnative-pg/pgbouncer:1.23.0March 17, 2025
  • Red Hat OpenShift Container Platform 4.18 Now Available on IBM Power

    Red Hat OpenShift 4.18 Now Available on IBM Power Red Hat® OpenShift® 4.18 has been released and adds improvements and new capabilities to OpenShift Container Platform components. Based on Kubernetes 1.31 and CRI-O 1.31, Red Hat OpenShift 4.18 focused on core improvements with enhanced network flexibility.

    You can download 4.18.1 from the mirror at https://mirror.openshift.com/pub/openshift-v4/multi/clients/ocp/4.18.1/ppc64le/

  • Nest Accelerator and Urandom… I think

    The NX accelerator has random number generation capabilities.

    What what happens if the random-number entropy pool runs out of numbers? If you are reading from the /dev/random device, your application will block waiting for new numbers to be generated. Alternatively the urandom device is non-blocking, and will create random numbers on the fly, re-using some of the entropy in the pool. This can lead to numbers that are less random than required for some use cases.

    Well, the Power9 and Power10 servers use the nest accelerator to generate the pseudo random numbers and maintains the pool.

    Each processor chip in a Power9 and Power10 server has an on-chip “nest” accelerator called the NX unit that provides specialized functions for general data compression, gzip compression, encryption, and random number generation. These accelerators are used transparently across the systems software stack to speed up operations related to Live Partition Migration, IPSec, JFS2 Encrypted File Systems, PKCS11 encryption, and random number generation through /dev/random and /dev/urandom.

    Kind of cool, I’ll have to find some more details to verify it and use it.

  • Kernel Stack Trace

    Quick hack to find stack trace.

    Look in proc find /proc -name stack

    You can see the last stack for example… /proc/479260/stack

    [<0>] hrtimer_nanosleep+0x89/0x120
    [<0>] __x64_sys_nanosleep+0x96/0xd0
    [<0>] do_syscall_64+0x5b/0x1a0
    [<0>] entry_SYSCALL_64_after_hwframe+0x66/0xcb
    

    It superb to figure out a real-time hang and pattern.

  • Nice article on name,version and references from Red Hat

    A reference can contain a domain (quay.io) pointing to the container registry, one or more repositories (also referred to as namespaces) on the registry (fedora), and an image (fedora-bootc) followed by a tag (41) and/or digest (sha256). Note that images can be referenced by tag, digest, or both at the same time..

    container image versioning

    Reference: How to name, version, and reference container images

  • OpenShift Container Platform and CGroups: Notes

    My notes from OCP/Cgroups debugging and usage.

    What is attaching the BPF program to my cgroup?

    When you create a Pod, the API Server reconciles the resource, and the Kube Scheduler is triggered to assign it to a Node. On the Node, the Kubelet converts to the OCI specification, enriches the container with host-device specific resources, and dispatches it to cri-o. cri-o, using the default container runtime launcher – runc or crun, and using the runc/crun configuration it launches and manages the container with SystemD, and attaches an eBPF program that controls device access.

    If you are seeing EPERM issues accessing a device, perhaps you don’t have the right access set at the Pod level, you may be able to use a Device Plugin.

    Options for adding Devices

    You have a couple of things to look at:

    1. volumeDevices
    2. io.kubernetes.cri-o.Devices
    3. cri-o config drop-in
    4. crun or runc with DeviceAllow https://github.com/containers/crun https://github.com/containers/crun/blob/017b5fddcb0a29938295d9a28fdc901164c77d74/contrib/seccomp-notify-plugin-rust/src/mknod.rs#L9
    5. A custom device plugin like https://github.com/IBM/power-device-plugin

    Note, it give R/W to the full device.

    Requires selinux-relabeling to be disabled

    You may need to stop selinux from relabeling the files when you run as randomized ids. The cloud pak describes an excelent way to disable selinux relabeling: https://www.ibm.com/docs/en/cloud-paks/cp-data/5.0.x?topic=1-disabling-selinux-relabeling

    You can confirm the file details using:

    sh-5.1$ ls -alZ /mnt/example/myfile.log
    -rw-r--r--. 1 xuser wheel system_u:object_r:container_file_t:s0 1053201 Dec 11 19:45 /mnt/example/myfile.log

    Switching Container Runtime Launchers

    You can switch your Container Runtime from runc to crun using:

    cat << EOF | oc apply -f -
    apiVersion: machineconfiguration.openshift.io/v1
    kind: ContainerRuntimeConfig
    metadata:
     name: container-crun
    spec:
     machineConfigPoolSelector:
       matchLabels:
         pools.operator.machineconfiguration.openshift.io/worker: '' 
     containerRuntimeConfig:
       logLevel: debug 
       overlaySize: 1G 
       defaultRuntime: "crun"
    EOF
    

    container_use_devices

    Allows containers to use any device volume mounted into container, see https://github.com/containers/container-selinux/blob/main/container.te#L39

    $ getsebool -a | grep container_use_devices
    container_use_devices --> off
    

    More details on creating a MachineConfig is at https://docs.openshift.com/container-platform/4.16/networking/multiple_networks/configuring-additional-network.html

    blktrace

    blktrace is a superb tool. You’ll just have to put the kernel in debug mode.

    blktrace -d /dev/sdf

    We also built a crio config script.

    https://www.redhat.com/en/blog/open-container-initiative-hooks-admission-control-podman

    https://www.redhat.com/en/blog/extending-the-runtime-functionality