After suspecting the Kernel Memory is leaked, using slabtop --sort c
where it shows high memory usage. You can use the following steps to confirm the memory usage culprit using slub_debug=U. (Thanks to ServerFault).
- Login to OpenShift
$ oc login
- Check that you don’t already see
99-master-kargs-slub
.
$ oc get mc 99-master-kargs-slub
- Create the slub_debug=U kernel argument. Note, that it’s assigned to the
master
role.
cat << EOF > 99-master-kargs-slub.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: master
name: 99-master-kargs-slub
spec:
kernelArguments:
- slub_debug=U
EOF
- Create the Kernel Arguments Machine Config.
$ oc apply -f 99-master-kargs-slub.yaml
machineconfig.machineconfiguration.openshift.io/99-master-kargs-slub created
- Wait until the master nodes are updated.
$ oc wait mcp/master --for condition=updated --timeout=25m
machineconfigpool.machineconfiguration.openshift.io/master condition met
- Confirm the node status as soon as it’s up, and list the master nodes.
$ oc get nodes -l machineconfiguration.openshift.io/role=master
NAME STATUS ROLES AGE VERSION
lon06-master-0.xip.io Ready master 30d v1.23.5+3afdacb
lon06-master-1.xip.io Ready master 30d v1.23.5+3afdacb
lon06-master-2.xip.io Ready master 30d v1.23.5+3afdacb
- Connect to the master node and switch to the root user
$ ssh core@lon06-master-0.xip.io
sudo su -
- Check the kmalloc-32 allocation
$ cat /sys/kernel/slab/kmalloc-32/alloc_calls | sort -n | tail -n 5
4334 iomap_page_create+0x80/0x190 age=0/654342/2594020 pid=1-39569 cpus=0-7
5655 selinux_sk_alloc_security+0x5c/0xd0 age=916/1870136/2594937 pid=0-39217 cpus=0-7
41908 __kernfs_new_node+0x70/0x2d0 age=406911/2326294/2594938 pid=0-38398 cpus=0-7
9969728 memcg_update_all_list_lrus+0x1bc/0x550 age=2564414/2567167/2594607 pid=1 cpus=0-7
19861376 __list_lru_init+0x2b8/0x480 age=406870/2007921/2594449 pid=1-38406 cpus=0-7
This points to memcg_update_all_list_lrus
is using a lot of resources, which is currently fixed in a patch to the Linux Kernel.
References
- https://serverfault.com/questions/1020241/debugging-kmalloc-64-slab-allocations-memory-leak
- http://www.jikos.cz/jikos/Kmalloc_Internals.html
- https://stackoverflow.com/questions/20079767/what-is-different-functions-malloc-and-kmalloc
- ServerFault: Debugging kmalloc-64 slab allocations / memory leak
- Kmalloc Internals: Exploring Linux Kernel Memory Allocation
- How I investigated memory leaks in Go using pprof on a large codebase
- Using Go 1.10 new trace features to debug an integration test
- Kernel Memory Leak Detector
- go-slab – slab allocator in go
- Red Hat Customer Support Portal: Interpreting /proc/meminfo and free output for Red Hat Enterprise Linux
- Red Hat Customer Support Portal: Determine how much memory is being used on the system
- Red Hat Customer Support Portal: Determine how much memory and what kind of objects the kernel is allocating