A brief Operator training I gave to my team resulted in these notes. Thanks to many others in the reference section.
An Operator codifies the tasks commonly associated with administrating, operating, and supporting an application. The codified tasks are event-driven responses to changes (create-update-delete-time) in the declared state relative to the actual state of an application, using domain knowledge to reconcile the state and report on the status.

Operators are used to execute basic and advanced operations:
Basic (Helm, Go, Ansible)
- Installation and Configuration
- Uninstall and Destroy
- Seamless Upgrades
Advanced (Go, Ansible)
- Application Lifecycle (Backup, Failure Recovery)
- Monitoring, Metrics, Alerts, Log Processing, Workload Analysis
- Auto-scaling: Horizontal and Vertical
- Event (Anomaly) Detection and Response (Remediation)
- Scheduling and Tuning
- Application Specific Management
- Continuous Testing and Chaos Monkey
Helm operators wrap helm charts in a simplistic view of the operation pass-through helm verbs, so one can install, uninstall, destroy, and upgrade using an Operator.
There are four actors in the Operator Pattern.
- Initiator – The user who creates the Custom Resource
- Operator – The Controller that operates on the Operand
- Operand – The target application
- OpenShift and Kubernetes Environment

Each Operator operates on an Operand using Managed Resources (Kubernetes and OpenShift) to reconcile states. The states are described in a domain specific language (DSL) encapsulated in a Custom Resource to describe the state of the application:
- spec – The User communicates to the Operator the desired state (Operator reads)
- status – The Operator communicates back to the User (Operator writes)
$ oc get authentications cluster -o yaml
apiVersion: config.openshift.io/v1
kind: Authentication
metadata:
annotations:
include.release.openshift.io/ibm-cloud-managed: "true"
include.release.openshift.io/self-managed-high-availability: "true"
include.release.openshift.io/single-node-developer: "true"
release.openshift.io/create-only: "true"
spec:
oauthMetadata:
name: ""
serviceAccountIssuer: ""
type: ""
webhookTokenAuthenticator:
kubeConfig:
name: webhook-authentication-integrated-oauth
status:
integratedOAuthMetadata:
name: oauth-openshift
While not limited to writing spec and status, if we think spec is initiator specified, and if we think status is operator written, then we limit the chances of creating an unintended reconciliation loop.
The DSL is specified as Custom Resource Definition:
$ oc get crd machinehealthchecks.machine.openshift.io -o=yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
spec:
conversion:
strategy: None
group: machine.openshift.io
names:
kind: MachineHealthCheck
listKind: MachineHealthCheckList
plural: machinehealthchecks
shortNames:
- mhc
- mhcs
singular: machinehealthcheck
scope: Namespaced
name: v1beta1
schema:
openAPIV3Schema:
description: 'MachineHealthCheck'
properties:
apiVersion:
description: 'APIVersion defines the versioned schema of this representation'
type: string
kind:
description: 'Kind is a string value representing the REST resource'
type: string
metadata:
type: object
spec:
description: Specification of machine health check policy
properties:
expectedMachines:
description: total number of machines counted by this machine health
check
minimum: 0
type: integer
unhealthyConditions:
description: UnhealthyConditions contains a list of the conditions.
items:
description: UnhealthyCondition represents a Node.
properties:
status:
minLength: 1
type: string
timeout:
description: Expects an unsigned duration string of decimal
numbers each with optional fraction and a unit suffix, eg
"300ms", "1.5h" or "2h45m". Valid time units are "ns", "us"
(or "µs"), "ms", "s", "m", "h".
pattern: ^([0-9]+(\.[0-9]+)?(ns|us|µs|ms|s|m|h))+$
type: string
type:
minLength: 1
type: string
type: object
minItems: 1
type: array
type: object
For example, these operators manage the applications by orchestrating operations based on changes to the CustomResource (DSL):
Operator Type/Language | What it does | Operations |
cluster-etcd-operator go | Manages etcd in OpenShift | Install Monitor Manage |
prometheus-operator go | Manages Prometheus monitoring on a Kubernetes cluster | Install Monitor Manage Configure |
cluster-authentication-operator go | Manages OpenShift Authentication | Manage Observe |
As a developer, we’re going to follow a common development pattern:
- Implement the Operator Logic (Reconcile the operational state)
- Bake Container Image
- Create or regenerate Custom Resource Definition (CRD)
- Create or regenerate Role-based Access Control (RBAC)
- Role
- RoleBinding
- Apply Operator YAML
Note, we’re not necessarily writing business logic, rather operational logic.
There are some best practices we follow:
- Develop one operator per application
- One CRD per Controller. Created and Fit for Purpose. Less Contention.
- No Cross Dependencies.
- Use Kubernetes Primitives when Possible
- Be Backwards Compatible
- Compartmentalize features via multiple controllers
- Scale = one controller
- Backup = one controller
- Use asynchronous metaphors with the synchronous reconciliation loop
- Error, then immediate return, backoff and check later
- Use concurrency to split the processing / state
- Prune Kubernetes Resources when not used
- Apps Run when Operators are stopped
- Document what the operator does and how it does it
- Install in a single command
We use the Operator SDK – one it’s supported by Red Hat and the CNCF.
operator-sdk: Which one? Ansible and Go
Kubernetes is authored in the Go language. Currently, OpenShift uses Go 1.17 and most operators are implemented in Go. The community has built many go-based operators, we have much more support on StackOverflow and a forum.
Ansible | Go | |
Kubernetes Support | Cached Clients | Solid, Complete and Rich Kubernetes Client |
Language Type | Declarative – describe the end state | Imperative – describe how to get to the end state |
Operator Type | Indirect Wrapped in the Ansible-Operator | Direct |
Style | Systems Administration | Systems Programming |
Performance | Link | ~4M at startup Single layer scratch image |
Security | Expanded Surface Area | Limited Surface Area |
Go is ideal for concurrency, strong memory management, everything is baked into the executable deliverable – it’s in memory and ready-to-go. There are lots of alternatives to code NodeJS, Rust, Java, C#, Python. The OpenShift Operators are not necessarily built on the Operator SDK.
Summary
We’ve run through a lot of detail on Operators and learned why we should go with Go operators.
Reference
- CNCF Operator White Paper https://github.com/cncf/tag-app-delivery/blob/main/operator-wg/whitepaper/Operator-WhitePaper_v1-0.md
- Operator pattern https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
- Operator SDK Framework https://sdk.operatorframework.io/docs/overview/
- Kubernetes Operators 101, Part 2: How operators work https://developers.redhat.com/articles/2021/06/22/kubernetes-operators-101-part-2-how-operators-work?source=sso#
- Build Kubernetes with the Right Tool https://cloud.redhat.com/blog/build-your-kubernetes-operator-with-the-right-tool https://hazelcast.com/blog/build-your-kubernetes-operator-with-the-right-tool/
- Build Your Kubernetes Operator with the Right Tool
- Operator SDK Best Practices https://sdk.operatorframework.io/docs/best-practices/
- Google Best practices for building Kubernetes Operators and stateful apps https://cloud.google.com/blog/products/containers-kubernetes/best-practices-for-building-kubernetes-operators-and-stateful-apps
- Kubernetes Operator Patterns and Best Practises https://github.com/IBM/operator-sample-go
- Fast vs Easy: Benchmarking Ansible Operators for Kubernetes https://www.ansible.com/blog/fast-vs-easy-benchmarking-ansible-operators-for-kubernetes
- Debugging a Kubernetes Operator https://www.youtube.com/watch?v=8hlx6F4wLAA&t=21s
- Contributing to the Image Registry Operator https://github.com/openshift/cluster-image-registry-operator/blob/master/CONTRIBUTING.md
- Leszko’s OperatorCon Presentation
- GitHub Repo for Session: https://github.com/leszko/build-your-operator