Topology Manager and OpenShift/Kubernetes

I recently had to work with the Kubernetes Topology Manager and OpenShift. Here is a braindump on Topology Manager:

If the Topology ManagerFeature Gate is enabled, then any active HintProviders are registered to the TopologyManager.

If the CPU Manager and feature gate are enabled, then the CPU Manager can be used to help workloads which are sensitive to CPU throttling, context switches, cache misses, require hyperthreads on same physical CPU core, low latency, and benefit from shared processor resources. The manager has two policies none and static which registers a NOP provider or statically locks the container to a set of CPUs.

If the Memory Manager and feature gate are enabled, then the MemoryManager can be used to process independently of the CPU Manager – e.g. allocate HugePages or guarnteed memory.

If Device Plugins are enabled, then it can be turned on to allocate Devices next to NUMA node resources (e.g., SR-IOV NICs). This may be used independent of the typical CPU/Memory management for GPUs and other machine devices.

Generally, these are all used together to generate a BitMask that admits a pod using a best-effort, restricted, or single-numa-node policy.

An important limitation is the Maximum Number of NUMA nodes is hard-coded to 8. When there are more than eight NUMA nodes, it’ll error out when assigning to the topology. The reason for this is related to state explosion and computational complexity.

  1. Check the worker nodes CPU if the NUMA returns 1, it’s a single NUMA node. If it returns 2 or more, it’s multiple NUMA nodes.
sh-4.4# lscpu | grep 'NUMA node(s)'
NUMA node(s):        1

The kubernetes/enhancements repo contains great detail on the flows and weaknesses of the TopologyManager.

To enable the Topology Manager, one uses Feature Gates:

And OpenShift prefers the FeatureSet LatencySensitive

  1. Via FeatureGate
$ oc patch featuregate cluster -p '{"spec": {"featureSet": "LatencySensitive"}}' --type merge

Which turns on the basic TopologyManager /etc/kubernetes/kubelet.conf

  "featureGates": {
    "APIPriorityAndFairness": true,
    "CSIMigrationAzureFile": false,
    "CSIMigrationvSphere": false,
    "DownwardAPIHugePages": true,
    "RotateKubeletServerCertificate": true,
    "TopologyManager": true
  },
  1. Create a custom KubeletConfig, this allows targeted TopologyManager feature enablement.

file: cpumanager-kubeletconfig.yaml

apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: cpumanager-enabled
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: cpumanager-enabled
  kubeletConfig:
     cpuManagerPolicy: static 
     cpuManagerReconcilePeriod: 5s 
$ oc create -f cpumanager-kubeletconfig.yaml

Net: They can be used independent of each other. They should be turned on at the same time to maximize the benefits.

There are some examples and test cases out there for Kubernetes and OpenShift

  1. Red Hat Sys Engineering Team Test cases for Performance Addon Operator which is now the Cluster Node Tuning Operator– These are the clearest tests, which apply directly to the Topology Manager.
  2. Kube Test Cases

This is one of the best examples k8stopologyawareschedwg/sample-device-plugin.

Tools to know about

  1. GitHub: numalign (amd64) – you can download this in the releases. In this fork prb112/numalign I added ppc64le to the build
  2. numactl and numastat are superbly helpful to see the topology spread on a node link to a handy pdf on numa I’ve been starting up a fedora container with numactl and numastat installed

Final note, I had written down that fedora is a great combination with taskset and numactl if you copy in the binaries. I think I used Fedora 35/36 as a container. link

Yes. I built a Hugepages hungry container Hugepages. I also looked at hugepages_tests.go and the test plan.

When it came down to it, I used my hunger container with the example.

I hope this helps others as they start to work with Topology Manager.

References

Red Hat

  1. Red Hat Topology Aware Scheduling in Kubernetes Part 1: The High Level Business Case
  2. Red Hat Topology Awareness in Kubernetes Part 2: Don’t we already have a Topology Manager?

OpenShift

  1. OpenShift 4.11: Using the Topology Manager
  2. OpenShift 4.11: Using device plug-ins to access external resources with pods
  3. OpenShift 4.11: Using Device Manager to make devices available to nodes Device Manager
  4. OpenShift 4.11: About Single Root I/O Virtualization (SR-IOV) hardware networks – Device Manager
  5. OpenShift 4.11: Adding a pod to an SR-IOV additional network
  6. OpenShift 4.11: Using CPU Manager CPU Manager

Kubernetes

  1. Kubernetes: Topology Manager Blog
  2. Feature Highlight: CPU Manager
  3. Feature: Utlizing the NUMA-aware Memory Manager

Kubernetes Enhancement

  1. KEP-693: Node Topology Manager e2e tests: Link
  2. KEP-2625: CPU Manager e2e tests: Link
  3. KEP-1769: Memory Manager Source: Link PR: Link

Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.