OCP Upgrade Process Flow - Continued

Step 3: Pause your worker node MCPs

In this example there are 2 MCPs, (mcp-1 & mcp-2) the spec.paused is set to true for each of these MCPs.

$ oc patch mcp/mcp-1 --type merge --patch '{"spec":{"paused":true}}'
$ oc patch mcp/mcp-2 --type merge --patch '{"spec":{"paused":true}}'
Two specific things above, the name of the mcp after the “/” and setting pause:true

Here is an easy way to read through the jq output of get -o json and print it out as a table:

[cnf@utility ~]$ oc get mcp -o json | jq -r '["MCP","Paused"], ["---","------"], (.items[] | [(.metadata.name), (.spec.paused)]) | @tsv' | grep -v worker
MCP     Paused
---     ------
master  false
mcp-1   true
mcp-2   true
This also includes the master and worker MCPs, which are not changed during an upgrade

Step 4: Backup etcd

Log into a control plane node:

$ oc debug node/ctrl-plane-0

# chroot /host

Run the backup script:

/usr/local/bin/cluster-backup.sh /home/core/assets/backup

Certificate /etc/kubernetes/static-pod-certs/configmaps/etcd-serving-ca/ca-bundle.crt is missing. Checking in different directory
Certificate /etc/kubernetes/static-pod-resources/etcd-certs/configmaps/etcd-serving-ca/ca-bundle.crt found!
found latest kube-apiserver: /etc/kubernetes/static-pod-resources/kube-apiserver-pod-17

…

{"level":"info","ts":"2023-11-13T19:57:56.87184Z","caller":"snapshot/v3_snapshot.go:97","msg":"saved","path":"/home/core/assets/backup/snapshot_2023-11-13_195755.db"}
Snapshot saved at /home/core/assets/backup/snapshot_2023-11-13_195755.db
Deprecated: Use `etcdutl snapshot status` instead.

{"hash":271797193,"revision":88606492,"totalKey":22247,"totalSize":201666560}
snapshot db and kube resources are successfully saved to /home/core/assets/backup

Pull files off of the control plane node:

$ ssh core@ctrl-plane-0 "sudo chown -R core assets"

$ scp core@ctrl-plane-0:/home/core/assets/backup/* .

Step 5: Double check your cluster health

Just because double checking is good…​

Some suggested checks at this time are:

  • Cluster operators

  • Node status

  • Look for failed pods

[cnf@utility ~]# oc get co
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.12.45   True        False         False      2d19h
baremetal                                  4.12.45   True        False         False      35d
cloud-controller-manager                   4.12.45   True        False         False      35d
cloud-credential                           4.12.45   True        False         False      35d
cluster-autoscaler                         4.12.45   True        False         False      35d
config-operator                            4.12.45   True        False         False      35d
console                                    4.12.45   True        False         False      34d
...
storage                                    4.12.45   True        False         False      35d
etcd                                       4.13.32   True        False         False      35d

[cnf@utility ~]# oc get node
NAME           STATUS   ROLES                  AGE   VERSION
ctrl-plane-0   Ready    control-plane,master   35d   v1.25.14+a52e8df
ctrl-plane-1   Ready    control-plane,master   35d   v1.25.14+a52e8df
ctrl-plane-2   Ready    control-plane,master   35d   v1.25.14+a52e8df
worker-0       Ready    mcp-1,worker           35d   v1.25.14+a52e8df
worker-1       Ready    mcp-2,worker           35d   v1.25.14+a52e8df

[cnf@utility ~]# oc get po -A | egrep -iv 'running|complete'
(Note: this should return NOTHING)