OCP Upgrade Process Flow - Continued

Step 13: Un-Pause the worker MCP(s)

Now you have gotten to the fun but sometimes long part of the upgrade process. Each of the worker nodes in the cluster will need to reboot to upgrade to the new EUS, Y-stream or Z-stream version.

You will need to determine how many MCPs you will want to upgrade at a time, depending on how many CNF pods can be taken down at a time and how your PDS and affinity are set up.

Here is a simple check and list of nodes with MCP:

[cnf@utility ~]$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-c9a52144456dbff9c9af9c5a37d1b614   True      False      False      3              3                   3                     0                      36d
mcp-1    rendered-mcp-1-07fe50b9ad51fae43ed212e84e1dcc8e    False     False      False      1              0                   0                     0                      47h
mcp-2    rendered-mcp-2-07fe50b9ad51fae43ed212e84e1dcc8e    False     False      False      1              0                   0                     0                      47h
worker   rendered-worker-f1ab7b9a768e1b0ac9290a18817f60f0   True      False      False      0              0                   0                     0                      36d

[cnf@utility ~]$ oc get no
NAME           STATUS   ROLES                  AGE   VERSION
ctrl-plane-0   Ready    control-plane,master   36d   v1.27.10+28ed2d7
ctrl-plane-1   Ready    control-plane,master   36d   v1.27.10+28ed2d7
ctrl-plane-2   Ready    control-plane,master   36d   v1.27.10+28ed2d7
worker-0       Ready    mcp-1,worker           36d   v1.25.14+a52e8df
worker-1       Ready    mcp-2,worker           36d   v1.25.14+a52e8df

[cnf@utility ~]$ oc get mcp -o json | jq -r '["MCP","Paused"], ["---","------"], (.items[] | [(.metadata.name), (.spec.paused)]) | @tsv' | grep -v worker
MCP     Paused
---     ------
master  false
mcp-1   true
mcp-2   true

Unpause a MCP with:

[jcl@utility ~]$ oc patch mcp/mcp-1 --type merge --patch '{"spec":{"paused":false}}'

machineconfigpool.machineconfiguration.openshift.io/mcp-1 patched

[jcl@utility ~]$ oc get mcp -o json | jq -r '["MCP","Paused"], ["---","------"], (.items[] | [(.metadata.name), (.spec.paused)]) | @tsv' | grep -v worker

MCP     Paused
---     ------
master  false
mcp-1   false
mcp-2   true

As each MCP is complete, then you can unpause the next MCP.

NAME           STATUS                        ROLES                  AGE   VERSION
ctrl-plane-0   Ready                         control-plane,master   36d   v1.27.10+28ed2d7
ctrl-plane-1   Ready                         control-plane,master   36d   v1.27.10+28ed2d7
ctrl-plane-2   Ready                         control-plane,master   36d   v1.27.10+28ed2d7
worker-0       Ready                         mcp-1,worker           36d   v1.27.10+28ed2d7
worker-1       NotReady,SchedulingDisabled   mcp-2,worker           36d   v1.25.14+a52e8df

Step 14: Verify Health of Cluster

Here is a set of commands that you should run after upgrading the cluster to verify everything is back up and running properly:

  • oc get clusterversion
    This should return showing the new cluster version and the “progressing” column should show “false”

  • oc get node
    All nodes in the cluster should have a status of “ready” and should be at the same version

  • oc get mcp -o json | jq -r '["MCP","Paused"], ["---","------"], (.items[] | [(.metadata.name), (.spec.paused)]) | @tsv' | grep -v worker
    This should show “false” for the paused column for all MCPs

  • oc get co
    All cluster operators should show available = true, progressing = false & degraded = false

  • oc get po -A | egrep -iv 'complete|running'
    This should return completely empty but you may show a few pods still moving around right after the upgrade. You may need to watch this for a while to make sure everything is clear.

[jcl@utility ~]$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.11   True        False         3d21h   Cluster version is 4.14.11
[jcl@utility ~]$ oc get no
NAME           STATUS   ROLES                  AGE   VERSION
ctrl-plane-0   Ready    control-plane,master   39d   v1.27.10+28ed2d7
ctrl-plane-1   Ready    control-plane,master   39d   v1.27.10+28ed2d7
ctrl-plane-2   Ready    control-plane,master   39d   v1.27.10+28ed2d7
worker-0       Ready    mcp-1,worker           39d   v1.27.10+28ed2d7
worker-1       Ready    mcp-2,worker           39d   v1.27.10+28ed2d7
[jcl@utility ~]$ oc get mcp -o json | jq -r '["MCP","Paused"], ["---","------"], (.items[] | [(.metadata.name), (.spec.paused)]) | @tsv' | grep -v worker
MCP     Paused
---     ------
master  false
mcp-1   false
mcp-2   false
[jcl@utility ~]$ oc get co
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.14.11   True        False         False      7d13h
baremetal                                  4.14.11   True        False         False      39d
cloud-controller-manager                   4.14.11   True        False         False      39d
cloud-credential                           4.14.11   True        False         False      39d
cluster-autoscaler                         4.14.11   True        False         False      39d
config-operator                            4.14.11   True        False         False      39d
console                                    4.14.11   True        False         False      38d
control-plane-machine-set                  4.14.11   True        False         False      39d
csi-snapshot-controller                    4.14.11   True        False         False      39d
dns                                        4.14.11   True        False         False      39d
etcd                                       4.14.11   True        False         False      39d
image-registry                             4.14.11   True        False         False      39d
ingress                                    4.14.11   True        False         False      38d
insights                                   4.14.11   True        False         False      39d
kube-apiserver                             4.14.11   True        False         False      39d
kube-controller-manager                    4.14.11   True        False         False      39d
kube-scheduler                             4.14.11   True        False         False      39d
kube-storage-version-migrator              4.14.11   True        False         False      3d18h
machine-api                                4.14.11   True        False         False      39d
machine-approver                           4.14.11   True        False         False      39d
machine-config                             4.14.11   True        False         False      39d
marketplace                                4.14.11   True        False         False      39d
monitoring                                 4.14.11   True        False         False      38d
network                                    4.14.11   True        False         False      39d
node-tuning                                4.14.11   True        False         False      3d22h
openshift-apiserver                        4.14.11   True        False         False      7d13h
openshift-controller-manager               4.14.11   True        False         False      39d
openshift-samples                          4.14.11   True        False         False      3d22h
operator-lifecycle-manager                 4.14.11   True        False         False      39d
operator-lifecycle-manager-catalog         4.14.11   True        False         False      39d
operator-lifecycle-manager-packageserver   4.14.11   True        False         False      39d
service-ca                                 4.14.11   True        False         False      39d
storage                                    4.14.11   True        False         False      39d
[jcl@utility ~]$ oc get po -A | egrep -iv 'complete|running'
NAMESPACE                                          NAME                                                        READY   STATUS      RESTARTS        AGE