Managing NVDIMM Devices with ndctl on OpenShift 4: A Complete Setup Guide

Managing Non-Volatile Memory devices (like /dev/nmem0) using ndctl from within a container on an OpenShift 4 (RHCOS) cluster presents a unique challenge. Because ndctl interacts directly with the kernel’s NVDIMM Firmware Interface Table (NFIT) and requires raw ioctl access to the devices, standard container sandboxing will block it. Furthermore, Red Hat CoreOS (RHCOS) is an immutable operating system, meaning day-two hardware management requires specific architectural considerations.

This guide breaks down the exact permissions, mounts, OpenShift constructs, and operational best practices required to configure and maintain this setup reliably.

Bypassing the Sandbox

To run a container that can access host hardware, you must configure high-level host privileges and explicit system mounts.

1. OpenShift Security Context Constraints (SCC) OpenShift enforces strict security by default. The Pod’s ServiceAccount must be granted the privileged SCC. This automatically transitions the pod’s SELinux context to spc_t (Super Privileged Container), which is required to bypass SELinux restrictions when accessing raw host device nodes on RHCOS.

# Grant the privileged SCC to your ServiceAccount
oc adm policy add-scc-to-user privileged -z my-ndctl-sa -n my-namespace

2. Pod Security Context Within your Pod deployment YAML, the container must explicitly request privileged execution. This grants the CAP_SYS_ADMIN capability (essential for ndctl hardware ioctls) and disables device cgroup filtering.

securityContext:
  privileged: true

3. Required Host Mounts The ndctl utility relies heavily on sysfs to discover the NVDIMM topology and /dev to issue commands. You must mount the host’s /dev and /sys directories into the container.

  • /dev: Required to access /dev/nmem0/dev/ndctl0, etc.
  • /sys: Required because ndctl scans /sys/class/nd/ and /sys/bus/nd/ to build the device tree.

Part 2: Deployment Configuration

Here is a complete example of how a CentOS 10 container pod would need to be configured to run ndctl successfully on your OpenShift cluster.

apiVersion: v1
kind: Pod
metadata:
  name: ndctl-manager
  namespace: my-namespace
spec:
  serviceAccountName: my-ndctl-sa
  containers:
  - name: centos10-ndctl
    image: quay.io/centos/centos10:latest
    command: ["/bin/sleep", "infinity"]
    securityContext:
      privileged: true
    volumeMounts:
    - name: host-dev
      mountPath: /dev
    - name: host-sys
      mountPath: /sys
  volumes:
  - name: host-dev
    hostPath:
      path: /dev
      type: Directory
  - name: host-sys
    hostPath:
      path: /sys
      type: Directory

Once it starts, run dnf install -y ndctl in the container.

Once deployed, you can execute into the container and run your ndctl commands. The OS mismatch between the RHCOS 9 host and the CentOS 10 container is fine, provided the kernel API/ABIs for NVDIMMs remain compatible. Ensure ndctl is installed in your custom image (dnf install ndctl). When running commands (e.g., ndctl create-namespace -f -e namespace0.0 -m fsdax), the utility will now have the necessary hardware visibility and CAP_SYS_ADMIN privileges.

Operational Best Practices for RHCOS

Keeping this setup working reliably as your cluster scales and upgrades requires adhering to OpenShift’s declarative nature.

Managing Kernel Modules via MachineConfig The underlying RHCOS 9 host must have the correct NVDIMM kernel modules loaded to expose /dev/nmem0. OpenShift does not always load NVDIMM/PMEM modules by default. Do not rely on a manual modprobe inside your container, as it will not survive a node reboot. Instead, create a MachineConfig object to drop a configuration file into /etc/modules-load.d/ on the RHCOS nodes for modules like libnvdimmnd_pmem, and dax_pmem.

The RHCOS Immutability Rule Hardware configurations saved to the physical NVDIMM’s label area via ndctl will survive a reboot. However, if your workflow requires changing host-level OS configurations—such as adding a udev rule for permissions or modifying /etc/fstab—do not do this directly from the container via the /host mount. Deploy host file changes via the Machine Config Operator (MCO) to prevent them from being overwritten during drift-reconciliation cycles.

Deploying as a DaemonSet with Node Affinity Running a static Pod is a fragile way to manage hardware in a dynamic cluster. To handle node failures and scaling, use the Node Feature Discovery (NFD) Operator to automatically label nodes that physically contain NVDIMM hardware. Then, wrap your management container in a DaemonSet configured with a nodeSelector or nodeAffinity targeting those NFD labels.

OpenShift Upgrades and ABI Compatibility You are running a CentOS 10 user-space against an RHCOS 9 kernel space. While Linux maintains strong backward compatibility for ioctl calls, there is a slight risk of divergence. During major OpenShift upgrades, test your container image in a non-production environment to verify that ndctl commands still successfully read the NFIT tables without ABI mismatch errors.

Interaction with Storage Operators Configuring the memory is only step one. If your goal is to let Kubernetes workloads consume the NVDIMM as persistent storage, you will likely need the Local Storage Operator (LSO) or a CSI driver (like the PMEM-CSI driver) to discover the newly formatted device. Ensure your container provisions the namespaces in the exact mode (fsdaxdevdax, or sector) expected by your storage operator.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *