Identifying Kernel Memory Usage Culprits

After suspecting the Kernel Memory is leaked, using slabtop --sort c where it shows high memory usage. You can use the following steps to confirm the memory usage culprit using slub_debug=U. (Thanks to ServerFault).

  1. Login to OpenShift
$ oc login
  1. Check that you don’t already see 99-master-kargs-slub.
$ oc get mc 99-master-kargs-slub
  1. Create the slub_debug=U kernel argument. Note, that it’s assigned to the master role.
cat << EOF > 99-master-kargs-slub.yaml
kind: MachineConfig
  labels: master
  name: 99-master-kargs-slub
  - slub_debug=U
  1. Create the Kernel Arguments Machine Config.
$ oc apply -f 99-master-kargs-slub.yaml created
  1. Wait until the master nodes are updated.
$ oc wait mcp/master --for condition=updated --timeout=25m condition met
  1. Confirm the node status as soon as it’s up, and list the master nodes.
$ oc get nodes -l
NAME                                                    STATUS   ROLES    AGE   VERSION   Ready    master   30d   v1.23.5+3afdacb   Ready    master   30d   v1.23.5+3afdacb   Ready    master   30d   v1.23.5+3afdacb
  1. Connect to the master node and switch to the root user
$ ssh
sudo su - 
  1. Check the kmalloc-32 allocation
$  cat /sys/kernel/slab/kmalloc-32/alloc_calls | sort -n  | tail -n 5
   4334 iomap_page_create+0x80/0x190 age=0/654342/2594020 pid=1-39569 cpus=0-7
   5655 selinux_sk_alloc_security+0x5c/0xd0 age=916/1870136/2594937 pid=0-39217 cpus=0-7
  41908 __kernfs_new_node+0x70/0x2d0 age=406911/2326294/2594938 pid=0-38398 cpus=0-7
9969728 memcg_update_all_list_lrus+0x1bc/0x550 age=2564414/2567167/2594607 pid=1 cpus=0-7
19861376 __list_lru_init+0x2b8/0x480 age=406870/2007921/2594449 pid=1-38406 cpus=0-7

This points to memcg_update_all_list_lrus is using a lot of resources, which is currently fixed in a patch to the Linux Kernel.


