I recently had to work with the Kubernetes Topology Manager and OpenShift. Here is a braindump on Topology Manager:

If the Topology Manager – Feature Gate is enabled, then any active HintProviders are registered to the TopologyManager.

If the CPU Manager and feature gate are enabled, then the CPU Manager can be used to help workloads which are sensitive to CPU throttling, context switches, cache misses, require hyperthreads on same physical CPU core, low latency, and benefit from shared processor resources. The manager has two policies none and static which registers a NOP provider or statically locks the container to a set of CPUs.

If the Memory Manager and feature gate are enabled, then the MemoryManager can be used to process independently of the CPU Manager – e.g. allocate HugePages or guarnteed memory.

If Device Plugins are enabled, then it can be turned on to allocate Devices next to NUMA node resources (e.g., SR-IOV NICs). This may be used independent of the typical CPU/Memory management for GPUs and other machine devices.

Generally, these are all used together to generate a BitMask that admits a pod using a best-effort, restricted, or single-numa-node policy.

An important limitation is the Maximum Number of NUMA nodes is hard-coded to 8. When there are more than eight NUMA nodes, it’ll error out when assigning to the topology. The reason for this is related to state explosion and computational complexity.

Check the worker nodes CPU if the NUMA returns 1, it’s a single NUMA node. If it returns 2 or more, it’s multiple NUMA nodes.

sh-4.4# lscpu | grep 'NUMA node(s)'
NUMA node(s):        1

The kubernetes/enhancements repo contains great detail on the flows and weaknesses of the TopologyManager.

To enable the Topology Manager, one uses Feature Gates:

And OpenShift prefers the FeatureSet LatencySensitive

Via FeatureGate

$ oc patch featuregate cluster -p '{"spec": {"featureSet": "LatencySensitive"}}' --type merge

Which turns on the basic TopologyManager /etc/kubernetes/kubelet.conf

  "featureGates": {
    "APIPriorityAndFairness": true,
    "CSIMigrationAzureFile": false,
    "CSIMigrationvSphere": false,
    "DownwardAPIHugePages": true,
    "RotateKubeletServerCertificate": true,
    "TopologyManager": true
  },

Create a custom KubeletConfig, this allows targeted TopologyManager feature enablement.

file: cpumanager-kubeletconfig.yaml

apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: cpumanager-enabled
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: cpumanager-enabled
  kubeletConfig:
     cpuManagerPolicy: static 
     cpuManagerReconcilePeriod: 5s

$ oc create -f cpumanager-kubeletconfig.yaml

Net: They can be used independent of each other. They should be turned on at the same time to maximize the benefits.

There are some examples and test cases out there for Kubernetes and OpenShift

Red Hat Sys Engineering Team Test cases for Performance Addon Operator which is now the Cluster Node Tuning Operator– These are the clearest tests, which apply directly to the Topology Manager.
Kube Test Cases
- Topology Manager
- CPU Manager
- Device Plugin If you already do SRIOV testing, this should be implicitly covered.
- Memory manager
- Test Cases Matrix from Kubernetes PR #83481

This is one of the best examples k8stopologyawareschedwg/sample-device-plugin.

Tools to know about

GitHub: numalign (amd64) – you can download this in the releases. In this fork prb112/numalign I added ppc64le to the build
numactl and numastat are superbly helpful to see the topology spread on a node link to a handy pdf on numa I’ve been starting up a fedora container with numactl and numastat installed

Final note, I had written down that fedora is a great combination with taskset and numactl if you copy in the binaries. I think I used Fedora 35/36 as a container. link

Yes. I built a Hugepages hungry container Hugepages. I also looked at hugepages_tests.go and the test plan.

When it came down to it, I used my hunger container with the example.

I hope this helps others as they start to work with Topology Manager.

References

Red Hat

OpenShift

Kubernetes

Kubernetes Enhancement

KEP-693: Node Topology Manager e2e tests: Link
KEP-2625: CPU Manager e2e tests: Link
KEP-1769: Memory Manager Source: Link PR: Link

Topology Manager and OpenShift/Kubernetes

Tools to know about

References

More posts

Great work from the IBM’s Power10 Private Cloud Rack for Db2 Warehouse team

Multi-Arch Compute and the Red Hat OpenShift Container Platform on IBM Power

Entering into Kubernetes Network Policies

Extending PCI-DSS v4 Support on Red Hat OpenShift Container Platform on IBM Power with the Compliance Operator