DRAFT This is not a complete article. I haven’t yet fully tested and vetted the steps I built. I will come back and hopefully update.
Kubernetes 1.30 introduces a powerful enhancement to CPU resource management: the ability to distribute CPUs across NUMA nodes using a new CPUManager policy. This feature, part of KEP-2902, enables better performance and resource utilization on multi-NUMA systems by spreading workloads instead of concentrating them on a single node.
Non-Uniform Memory Access (NUMA) is a memory design used in modern multi-socket systems where each CPU socket has its own local memory. Accessing local memory is faster than accessing memory attached to another CPU. Therefore, NUMA-aware scheduling is crucial for performance-sensitive workloads.
Traditionally, Kubernetes’ CPUManager used a “packed” policy, allocating CPUs from a single NUMA node to reduce latency. However, this can lead to resource contention and underutilization in systems with multiple NUMA nodes.
- High-throughput applications like databases or analytics engines
- Multi-threaded workloads that benefit from parallelism
- NUMA-aware applications that manage memory locality explicitly
The new “distributed” policy spreads CPU allocations across NUMA nodes, improving parallelism and overall system throughput.
To enable the distributed CPUManager Policy, here is a step-by-step guide to enable and use this feature on Kubernetes v1.30+:
- Label the nodes you want to be enabled with
cpumanagerand the distributed policy.
oc label node worker-0 custom-kubelet=cpumanager-enabled
- Create a custom KubeletConfig to allow the
CPUManagerto usedistributedcpuManagerPolicy.
cat << EOF | oc apply -f -
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
name: cpumanager-enabled
spec:
machineConfigPoolSelector:
matchLabels:
custom-kubelet: cpumanager-enabled
kubeletConfig:
cpuManagerPolicy: distributed
cpuManagerReconcilePeriod: 5s
EOF
- Wait for the Node to restart the
Kubelet - Create a Pod to request Guaranteed QoS by specifying equal CPU
requestsandlimits:
apiVersion: v1
kind: Pod
metadata:
name: numa-aware-pod
spec:
containers:
- name: workload
image: your-image
resources:
requests:
cpu: "4"
limits:
cpu: "4"
Kubernetes will now distribute the 4 CPUs across NUMA nodes instead of packing them on one.
To visualize the difference, here’s a conceptual graphic to illustrate the difference between the two policies:
Packed Policy:
NUMA Node 0: [CPU0, CPU1, CPU2, CPU3, CPU4, CPU5, CPU6, CPU7] ← All assigned here
NUMA Node 1: [CPU0 ]
Distributed Policy:
NUMA Node 0: [CPU0, CPU1, CPU2, CPU3]
NUMA Node 1: [CPU0, CPU1, CPU2, CPU3] ← Balanced across nodes
This balance reduces memory contention and improves cache locality for distributed workloads.
This enhancement gives Kubernetes administrators more control over CPU topology, enabling better performance tuning for complex workloads. It’s a great step forward in making Kubernetes more NUMA-aware and suitable for high-performance computing environments.b