A brief Operator training I gave to my team resulted in these notes. Thanks to many others in the reference section.

An Operator codifies the tasks commonly associated with administrating, operating, and supporting an application. The codified tasks are event-driven responses to changes (create-update-delete-time) in the declared state relative to the actual state of an application, using domain knowledge to reconcile the state and report on the status.

Figure 1 Operator Pattern

Operators are used to execute basic and advanced operations:

Basic (Helm, Go, Ansible)

Installation and Configuration
Uninstall and Destroy
Seamless Upgrades

Advanced (Go, Ansible)

Application Lifecycle (Backup, Failure Recovery)
Monitoring, Metrics, Alerts, Log Processing, Workload Analysis
Auto-scaling: Horizontal and Vertical
Event (Anomaly) Detection and Response (Remediation)
Scheduling and Tuning
Application Specific Management
Continuous Testing and Chaos Monkey

Helm operators wrap helm charts in a simplistic view of the operation pass-through helm verbs, so one can install, uninstall, destroy, and upgrade using an Operator.

There are four actors in the Operator Pattern.

Initiator – The user who creates the Custom Resource
Operator – The Controller that operates on the Operand
Operand – The target application
OpenShift and Kubernetes Environment

Figure 2 Common Terms

Each Operator operates on an Operand using Managed Resources (Kubernetes and OpenShift) to reconcile states. The states are described in a domain specific language (DSL) encapsulated in a Custom Resource to describe the state of the application:

spec – The User communicates to the Operator the desired state (Operator reads)
status – The Operator communicates back to the User (Operator writes)

$ oc get authentications cluster -o yaml
apiVersion: config.openshift.io/v1
kind: Authentication
metadata:
  annotations:
    include.release.openshift.io/ibm-cloud-managed: "true"
    include.release.openshift.io/self-managed-high-availability: "true"
    include.release.openshift.io/single-node-developer: "true"
    release.openshift.io/create-only: "true"
spec:
  oauthMetadata:
    name: ""
  serviceAccountIssuer: ""
  type: ""
  webhookTokenAuthenticator:
    kubeConfig:
      name: webhook-authentication-integrated-oauth
status:
  integratedOAuthMetadata:
    name: oauth-openshift

While not limited to writing spec and status, if we think spec is initiator specified, and if we think status is operator written, then we limit the chances of creating an unintended reconciliation loop.

The DSL is specified as Custom Resource Definition:

$ oc get crd machinehealthchecks.machine.openshift.io -o=yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
spec:
  conversion:
    strategy: None
  group: machine.openshift.io
  names:
    kind: MachineHealthCheck
    listKind: MachineHealthCheckList
    plural: machinehealthchecks
    shortNames:
    - mhc
    - mhcs
    singular: machinehealthcheck
  scope: Namespaced
    name: v1beta1
    schema:
      openAPIV3Schema:
        description: 'MachineHealthCheck'
        properties:
          apiVersion:
            description: 'APIVersion defines the versioned schema of this representation'
            type: string
          kind:
            description: 'Kind is a string value representing the REST resource'
            type: string
          metadata:
            type: object
          spec:
            description: Specification of machine health check policy
            properties:
              expectedMachines:
                description: total number of machines counted by this machine health
                  check
                minimum: 0
                type: integer
              unhealthyConditions:
                description: UnhealthyConditions contains a list of the conditions.
                items:
                  description: UnhealthyCondition represents a Node.
                  properties:
                    status:
                      minLength: 1
                      type: string
                    timeout:
                      description: Expects an unsigned duration string of decimal
                        numbers each with optional fraction and a unit suffix, eg
                        "300ms", "1.5h" or "2h45m". Valid time units are "ns", "us"
                        (or "µs"), "ms", "s", "m", "h".
                      pattern: ^([0-9]+(\.[0-9]+)?(ns|us|µs|ms|s|m|h))+$
                      type: string
                    type:
                      minLength: 1
                      type: string
                  type: object
                minItems: 1
                type: array
            type: object

For example, these operators manage the applications by orchestrating operations based on changes to the CustomResource (DSL):

Operator Type/Language	What it does	Operations
cluster-etcd-operator go	Manages etcd in OpenShift	Install Monitor Manage
prometheus-operator go	Manages Prometheus monitoring on a Kubernetes cluster	Install Monitor Manage Configure
cluster-authentication-operator go	Manages OpenShift Authentication	Manage Observe

As a developer, we’re going to follow a common development pattern:

Implement the Operator Logic (Reconcile the operational state)
Bake Container Image
Create or regenerate Custom Resource Definition (CRD)
Create or regenerate Role-based Access Control (RBAC)
1. Role
1. RoleBinding
Apply Operator YAML

Note, we’re not necessarily writing business logic, rather operational logic.

There are some best practices we follow:

Develop one operator per application
1. One CRD per Controller. Created and Fit for Purpose. Less Contention.
1. No Cross Dependencies.
Use Kubernetes Primitives when Possible
Be Backwards Compatible
Compartmentalize features via multiple controllers
1. Scale = one controller
1. Backup = one controller
Use asynchronous metaphors with the synchronous reconciliation loop
1. Error, then immediate return, backoff and check later
1. Use concurrency to split the processing / state
Prune Kubernetes Resources when not used
Apps Run when Operators are stopped
Document what the operator does and how it does it
Install in a single command

We use the Operator SDK – one it’s supported by Red Hat and the CNCF.

operator-sdk: Which one? Ansible and Go

Kubernetes is authored in the Go language. Currently, OpenShift uses Go 1.17 and most operators are implemented in Go. The community has built many go-based operators, we have much more support on StackOverflow and a forum.

	Ansible	Go
Kubernetes Support	Cached Clients	Solid, Complete and Rich Kubernetes Client
Language Type	Declarative – describe the end state	Imperative – describe how to get to the end state
Operator Type	Indirect Wrapped in the Ansible-Operator	Direct
Style	Systems Administration	Systems Programming
Performance	Link	~4M at startup Single layer scratch image
Security	Expanded Surface Area	Limited Surface Area

Go is ideal for concurrency, strong memory management, everything is baked into the executable deliverable – it’s in memory and ready-to-go. There are lots of alternatives to code NodeJS, Rust, Java, C#, Python. The OpenShift Operators are not necessarily built on the Operator SDK.

Summary

We’ve run through a lot of detail on Operators and learned why we should go with Go operators.

Reference

CNCF Operator White Paper https://github.com/cncf/tag-app-delivery/blob/main/operator-wg/whitepaper/Operator-WhitePaper_v1-0.md
Operator pattern https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
Operator SDK Framework https://sdk.operatorframework.io/docs/overview/
Kubernetes Operators 101, Part 2: How operators work https://developers.redhat.com/articles/2021/06/22/kubernetes-operators-101-part-2-how-operators-work?source=sso#
Build Kubernetes with the Right Tool https://cloud.redhat.com/blog/build-your-kubernetes-operator-with-the-right-tool https://hazelcast.com/blog/build-your-kubernetes-operator-with-the-right-tool/
Build Your Kubernetes Operator with the Right Tool
Operator SDK Best Practices https://sdk.operatorframework.io/docs/best-practices/
Google Best practices for building Kubernetes Operators and stateful apps https://cloud.google.com/blog/products/containers-kubernetes/best-practices-for-building-kubernetes-operators-and-stateful-apps
Kubernetes Operator Patterns and Best Practises https://github.com/IBM/operator-sample-go
Fast vs Easy: Benchmarking Ansible Operators for Kubernetes https://www.ansible.com/blog/fast-vs-easy-benchmarking-ansible-operators-for-kubernetes
Debugging a Kubernetes Operator https://www.youtube.com/watch?v=8hlx6F4wLAA&t=21s
Contributing to the Image Registry Operator https://github.com/openshift/cluster-image-registry-operator/blob/master/CONTRIBUTING.md
Leszko’s OperatorCon Presentation
1. YouTube https://www.youtube.com/watch?v=hTapESrAmLc
1. GitHub Repo for Session: https://github.com/leszko/build-your-operator

Operator Training – Part 1: Concepts and Why Use Go

operator-sdk: Which one? Ansible and Go

Summary

Reference

More posts

2026-07: Additions IBM Power Open Source Images on the IBM Container Registry

How to Manually Extract RHCOS Build Artifacts for OpenShift (ppc64le)

A Guide to Setting up Internet Egress for PowerVS

Security Profiles Operator: v1.0.0 is out! 🛡️