My teammate was investigating an SSHD config change and hit a stuck MachineConfigPool. Here are some steps we followed to get it unstuck.
Steps
- Verify that the MachineConfigPool is stuck updating
❯ oc get mcp
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-0de63bfa1c0db0777031adddb3286fbc False True True 3 0 0 3 9d
worker rendered-worker-38e4049eaf0b7fca848408378092e607 True False False 3 3 3 0 9d
- Find out for one of your nodes in the mcp that is stuck (for instance, master-0)
❯ oc get pods -n openshift-machine-config-operator --field-selector spec.nodeName=master-0
NAME READY STATUS RESTARTS AGE
machine-config-daemon-t8x8j 2/2 Running 2 35h
machine-config-server-kfx8n 1/1 Running 1 35h
- Check the logs and grab the rendered-master
❯ oc logs pod/machine-config-daemon-tgnss -n openshift-machine-config-operator
...
E0124 07:19:26.746977 780508 on_disk_validation.go:208] content mismatch for file "/etc/ssh/sshd_config" (-want +got):
bytes.Join({
- "\n#\t",
+ "# ",
"$OpenBSD: sshd_config,v 1.103 2018/04/09 20:41:22 tj Exp $\n\n# Th",
"is is the sshd server system-wide configuration file. See\n# ssh",
... // 1437 identical bytes
"keys and .ssh/authorized_keys2\n# but this is overridden so insta",
"llations will only check .ssh/authorized_keys\nAuthorizedKeysFile",
- ` `,
+ " ",
".ssh/authorized_keys\n\n#AuthorizedPrincipalsFile none\n\n#Authorize",
"dKeysCommand none\n#AuthorizedKeysCommandUser nobody\n\n# For this ",
... // 2258 identical bytes
"E LC_MEASUREMENT\nAcceptEnv LC_IDENTIFICATION LC_ALL LANGUAGE\nAcc",
...
+ "\n",
}, "")
E0124 07:19:26.747042 780508 writer.go:200] Marking Degraded due to: unexpected on-disk state validating against rendered-master-0de63bfa1c0db0777031adddb3286fbc: content mismatch for file "/etc/ssh/sshd_config"
I0124 07:19:28.973484 780508 daemon.go:1248] Current+desired config: rendered-master-0de63bfa1c0db0777031adddb3286fbc
...
- OK, this looks like a problem with the whitespace, and inspect the URL decoded version’s whites pace
vim :set list
> oc get mc rendered-master-0de63bfa1c0db0777031adddb3286fbc -o yaml > out.yaml
You may have to update the white space.
- Check the reasons for the failure if the whitespace doesn’t fix it.
> oc describe mcp master
Message:
Node master-0 is reporting:
"unexpected on-disk state validating against rendered-master-0de63bfa1c0db0777031adddb3286fbc:
mode mismatch for file: \"/etc/ssh/sshd_config\";
expected: -rw-------/384/0600; received: -rw-r--r--/420/0644",
Node master-1 is reporting: "unexpected on-disk state validating
against rendered-master-0de63bfa1c0db0777031adddb3286fbc: content
mismatch for file \"/etc/ssh/sshd_config\"", Node master-2 is reporting:
"unexpected on-disk state validating against
rendered-master-0de63bfa1c0db0777031adddb3286fbc: content mismatch for file
\"/etc/ssh/sshd_config\""
In this case, the local files were edited while preparing the ideal sshd_config and needed a forced update.
- Force the machine-config to refresh files.
> touch /run/machine-config-daemon-force
- You should see the states change after the node reboots.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal AnnotationChange 5m19s machineconfigcontroller-nodecontroller Node master-0 now has machineconfiguration.openshift.io/state=Done
degradedMachineCount: 2
machineCount: 3
observedGeneration: 500
readyMachineCount: 0
unavailableMachineCount: 2
updatedMachineCount: 0
If you need to select a file from the rendered config:
> oc get mc rendered-master-0de63bfa1c0db0777031adddb3286fbc -o yaml | yq -r '.spec.config[].files[] | select(.path == "/etc/ssh/sshd_config").contents.source'
data:,%0A%23%09$OpenBSD:%20sshd_config%2Cv%201.103
...
Leave a Reply