The following is a research project I investigated… and notes on what I would do, saving for others to take advantage of:
To demonstrate RoCE (RDMA over Converged Ethernet) usage across nodes on Red Hat OpenShift, you need a container image that includes the RDMA core libraries, OFED drivers, and performance testing tools like perftest (which provides ib_write_bw, ib_send_lat, etc.).
Based on the Red Hat learning path you provided, here is a optimized Podman/Docker Dockerfile and the necessary configuration to run it.
1. The Podman/Docker image
This Dockerfile uses Red Hat Universal Base Image (UBI) 9 and installs the essential RDMA stack and the perftest suite.
# Use RHEL 9 UBI as the base
FROM registry.access.redhat.com/ubi9/ubi:latest
LABEL maintainer="OpenShift RoCE Demo"
# Install RDMA core libraries, drivers, and performance testing tools
# 'perftest' contains the ib_write_bw, ib_read_bw, etc. commands
RUN dnf install -y \
libibverbs \
libibverbs-utils \
rdma-core \
iproute \
pciutils \
ethtool \
perftest \
&& dnf clean all
# Set working directory
WORKDIR /root
# Default command to keep the container running so you can 'exec' into it
CMD ["sleep", "infinity"]
2. Build and Push the Image
Use Podman to build the image and push it to a registry accessible by your OpenShift cluster (e.g., Quay.io or your internal OpenShift registry).
# Build the image
podman build -t quay.io/<your-username>/roce-test:latest .
# Push the image
podman push quay.io/<your-username>/roce-test:latest
3. Demonstrating Cross-Node Usage (The Test)
To prove RoCE is working across nodes, you must bypass the standard SDN (Software Defined Network) by using Host Networking or a Secondary Network (Multus). For a quick demonstration, we use hostNetwork: true.
Step A: Deploy two Pods on different nodes
Create a file named roce-demo.yaml:
apiVersion: v1
kind: Pod
metadata:
name: roce-server
labels:
app: roce-test
spec:
hostNetwork: true # Required to access the host's RDMA/RoCE hardware
containers:
- name: main
image: quay.io/<your-username>/roce-test:latest
securityContext:
privileged: true # Required for RDMA device access
---
apiVersion: v1
kind: Pod
metadata:
name: roce-client
labels:
app: roce-test
spec:
hostNetwork: true
containers:
- name: main
image: quay.io/<your-username>/roce-test:latest
securityContext:
privileged: true
Step B: Run the Performance Benchmark
- Identify the IP of the Server Node:
oc get pod roce-server -o wide
# Note the IP (since it's hostNetwork, this is the Node's IP)
- Start the Server:
oc exec -it roce-server -- ib_write_bw -d <rdma_device_name> -a
(Note: Use ibv_devinfo inside the pod to find your device name, e.g., mlx5_0) 3. Run the Client (from the other pod):
oc exec -it roce-client -- ib_write_bw -d <rdma_device_name> <server_ip> -a
How this demonstrates RoCE:
- Zero-Copy: The
ib_write_bwtool performs memory-to-memory transfers without involving the CPU’s TCP/IP stack. - Performance: If RoCE is correctly configured in your OpenShift cluster (via the Node Network Configuration Policy), you will see bandwidth near the line rate (e.g., ~95Gbps on a 100G link) with extremely low latency compared to standard Ethernet.
- Verification: You can run
ethtool -S <interface>on the host while the test is running to see therdma_counters increasing, confirming the traffic is not using standard TCP.