Scaling the Hosted Cluster
At this point we have a Hosted Cluster up and running. One common operation is the scaling in/out of nodes. In this section we will cover the two methods of scaling a Hosted Cluster: Manual and Automatic.
The scaling operations can be done from the Web Console or from the CLI. This section will only cover the CLI method as it’s more convenient than the Web Console.
Manually Scaling the Hosted Cluster
The scale operation can be done per NodePool
, this means that you can scale different NodePools
individually without impacting the others. When the number of replicas in the NodePool
object is changed, the capi-provider
component tries to locate an available agent and deploy it as an additional worker on the corresponding hosted cluster. In this first scenario we are going to add a third node to our cluster.
-
Scale the existing
NodePool
from2
replicas to3
replicas.This operation is done from the management cluster. Users consuming the Hosted Cluster cannot add nodes by themselves. oc --kubeconfig ~/hypershift-lab/mgmt-kubeconfig -n hosted scale \ nodepool nodepool-hosted-1 --replicas 3
nodepool.hypershift.openshift.io/nodepool-hosted-1 scaled
-
We should see the free agent being assigned to the Hosted Cluster now.
oc --kubeconfig ~/hypershift-lab/mgmt-kubeconfig -n hardware-inventory get agents
You can see the CLUSTER
is set tohosted
for the first agent. It can take a few minutes for the agent to be assigned to the Hosted Cluster after scaling the NodePool.NAME CLUSTER APPROVED ROLE STAGE aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0201 hosted true auto-assign aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0202 hosted true worker Done aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0203 hosted true worker Done
-
And after a few moments, the installation should begin.
oc --kubeconfig ~/hypershift-lab/mgmt-kubeconfig -n hardware-inventory get agents
It can take up to 5 minutes for the installation to start. NAME CLUSTER APPROVED ROLE STAGE aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0201 hosted true worker Writing image to disk aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0202 hosted true worker Done aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0203 hosted true worker Done
-
Once finished, the agent will move to
Done
stage.oc --kubeconfig ~/hypershift-lab/mgmt-kubeconfig -n hardware-inventory get agents
It can take up to 10 minutes for the installation to finish. NAME CLUSTER APPROVED ROLE STAGE aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0201 hosted true worker Done aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0202 hosted true worker Done aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0203 hosted true worker Done
-
If we check the Hosted Cluster nodes we will see a third one was added.
oc --insecure-skip-tls-verify=true --kubeconfig ~/hypershift-lab/hosted-kubeconfig \ get nodes
NAME STATUS ROLES AGE VERSION hosted-worker0 Ready worker 2m8s v1.26.3+b404935 hosted-worker1 Ready worker 36m v1.26.3+b404935 hosted-worker2 Ready worker 36m v1.26.3+b404935
-
Now that we have seen how to add a node, let’s see how to scale down the Hosted Cluster and thus removing a node. We can run the scale command again requesting 2 replicas.
oc --kubeconfig ~/hypershift-lab/mgmt-kubeconfig -n hosted scale \ nodepool nodepool-hosted-1 --replicas 2
nodepool.hypershift.openshift.io/nodepool-hosted-1 scaled
-
At this point one of the workers will be cordoned and workloads will be evicted.
oc --insecure-skip-tls-verify=true --kubeconfig ~/hypershift-lab/hosted-kubeconfig \ get nodes
NAME STATUS ROLES AGE VERSION hosted-worker0 Ready worker 2m29s v1.26.3+b404935 hosted-worker1 Ready worker 36m v1.26.3+b404935 hosted-worker2 Ready,SchedulingDisabled worker 37m v1.26.3+b404935
-
After a few minutes, the node will be gone.
oc --insecure-skip-tls-verify=true --kubeconfig ~/hypershift-lab/hosted-kubeconfig \ get nodes
It can take a while for the node to exit the cluster. It will very much depend on the workloads running on it. During our tests it took like 5 minutes. If the node being deleted ends up stuck in NotReady,SchedulingDisabled
state, follow the instructions here.NAME STATUS ROLES AGE VERSION hosted-worker0 Ready worker 3m26s v1.26.3+b404935 hosted-worker1 Ready worker 37m v1.26.3+b404935
-
And the agent will be back on the pool so it can be reused later.
oc --kubeconfig ~/hypershift-lab/mgmt-kubeconfig -n hardware-inventory get agent
NAME CLUSTER APPROVED ROLE STAGE aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0201 hosted true worker Done aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0202 hosted true worker Done aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0203 true auto-assign
Enabling Auto Scaling for the Hosted Cluster
In the previous section we have seen how we can add/remove capacity to/from a hosted cluster manually. This operation can be automated so when a hosted cluster requires more capacity a new node will be added to the cluster providing that there are spare agents to be provisioned. Let’s see how it works.
-
We need to enable the auto-scaling for our
NodePool
, we’re setting a minimum of 2 nodes and a maximum of 3. This means that the hosted cluster will add 1 extra worker when cluster capacity is reached.The hosted cluster will be scaled down if the additional capacity has not been used for the past 10 minutes. oc --kubeconfig ~/hypershift-lab/mgmt-kubeconfig -n hosted patch \ nodepool nodepool-hosted-1 --type=json \ -p '[{"op": "remove", "path": "/spec/replicas"},{"op":"add", "path": "/spec/autoScaling", "value": { "max": 3, "min": 2 }}]'
nodepool.hypershift.openshift.io/nodepool-hosted-1 patched
-
At this point we need to generate that extra load. We have two workers with 12 vCPUs, that means that if we have workloads running on the cluster requesting more than 24 vCPUs, extra capacity should be added to the cluster. Let’s create such workload.
We are requesting 3 replicas, each requesting 10 vCPUs. We need an extra node to accommodate the three replicas. oc --insecure-skip-tls-verify=true --kubeconfig ~/hypershift-lab/hosted-kubeconfig \ -n default create deployment test-app --image=quay.io/mavazque/reversewords:latest oc --insecure-skip-tls-verify=true --kubeconfig ~/hypershift-lab/hosted-kubeconfig \ -n default patch deployment test-app \ -p '{"spec":{"template":{"spec":{"$setElementOrder/containers":[{"name":"reversewords"}],"containers":[{"name":"reversewords","resources":{"requests":{"cpu":10}}}]}}}}'
deployment.apps/test-app created deployment.apps/test-app patched
-
If we check the pods we will see that we only have one pods and it is in
Running
state, that’s because we have enough capacity in the hosted cluster to run the workload.oc --insecure-skip-tls-verify=true --kubeconfig ~/hypershift-lab/hosted-kubeconfig \ -n default get pods
NAME READY STATUS RESTARTS AGE test-app-d97c4f77b-8kddp 1/1 Running 0 94s
-
Now, let’s see what happens if we try to get three replicas of the app running.
oc --insecure-skip-tls-verify=true --kubeconfig ~/hypershift-lab/hosted-kubeconfig \ -n default scale deployment test-app --replicas 3
deployment.apps/test-app scaled
-
If we check the pods we will see that one of the pods is in
Pending
state, that’s because the current cluster with two workers cannot schedule the third pod due to insufficient vCPU capacity.oc --insecure-skip-tls-verify=true --kubeconfig ~/hypershift-lab/hosted-kubeconfig \ -n default get pods
NAME READY STATUS RESTARTS AGE test-app-d97c4f77b-8kddp 1/1 Running 0 3m32s test-app-d97c4f77b-jhrd7 0/1 Pending 0 5s test-app-d97c4f77b-wbr8c 1/1 Running 0 5s
-
At this point the
NodePool
will be scaled automatically, if we check theNodePool
this is what we see.oc --kubeconfig ~/hypershift-lab/mgmt-kubeconfig -n hosted get nodepool nodepool-hosted-1
Check the message. Scaling up MachineSet
… It can take a few minutes for the autoscaling to start.NAME CLUSTER DESIRED NODES CURRENT NODES AUTOSCALING AUTOREPAIR VERSION UPDATINGVERSION UPDATINGCONFIG MESSAGE nodepool-hosted-1 hosted 2 True False 4.13.0 Scaling up MachineSet to 3 replicas (actual 2)
-
The spare agent is joining the cluster.
oc --kubeconfig ~/hypershift-lab/mgmt-kubeconfig -n hardware-inventory get agent
NAME CLUSTER APPROVED ROLE STAGE aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0201 hosted true worker Done aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0202 hosted true worker Done aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0203 hosted true worker Writing image to disk
-
Once the new node joins the cluster the workload will be running.
oc --insecure-skip-tls-verify=true --kubeconfig ~/hypershift-lab/hosted-kubeconfig \ get nodes
It can take up to 10 minutes for the new node to join the cluster. NAME STATUS ROLES AGE VERSION hosted-worker0 Ready worker 17m v1.26.3+b404935 hosted-worker1 Ready worker 39m v1.26.3+b404935 hosted-worker2 Ready worker 72s v1.26.3+b404935
oc --insecure-skip-tls-verify=true --kubeconfig ~/hypershift-lab/hosted-kubeconfig \ -n default get pods
NAME READY STATUS RESTARTS AGE test-app-d97c4f77b-8kddp 1/1 Running 0 13m test-app-d97c4f77b-jhrd7 1/1 Running 0 10m test-app-d97c4f77b-wbr8c 1/1 Running 0 10m
-
If we delete the workload, after 10 minutes the
NodePool
will be automatically scaled down again.oc --insecure-skip-tls-verify=true --kubeconfig ~/hypershift-lab/hosted-kubeconfig \ -n default delete deployment test-app
deployment.apps "test-app" deleted
-
Once the
NodePool
gets scaled down, the hosted cluster will be back to two nodes.If the node being deleted ends up stuck in NotReady,SchedulingDisabled
state, follow the instructions here.oc --insecure-skip-tls-verify=true --kubeconfig ~/hypershift-lab/hosted-kubeconfig \ -n default get nodes
It can take up to 5 minutes for the node to leave the cluster. NAME STATUS ROLES AGE VERSION hosted-worker0 Ready worker 28m v1.26.3+b404935 hosted-worker1 Ready worker 50m v1.26.3+b404935
-
Finally, disable the auto-scaling.
oc --kubeconfig ~/hypershift-lab/mgmt-kubeconfig -n hosted \ patch nodepool nodepool-hosted-1 --type=json \ -p '[{"op": "remove", "path": "/spec/autoScaling"},{"op":"add", "path": "/spec/replicas", "value": 2}]'
Fixing Stuck Deleted Node
DO NOT RUN THIS UNLESS YOU HIT THE NotReady,SchedulingDisabled NODE ISSUE!
|
Under certain conditions, the node leaving the cluster may end up in an inconsistent state that will prevent its deletion. With below steps we will fix the issue.
-
Get the
Machine
that holds the stuck node.oc --kubeconfig ~/hypershift-lab/mgmt-kubeconfig -n hosted-hosted get machines.cluster.x-k8s.io
NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION nodepool-hosted-1-hsmnq hosted hosted-worker1 agent://aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0202 Running 1h 4.13.0 nodepool-hosted-1-ktkmc hosted hosted-worker0 agent://aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0201 Running 1h 4.13.0 nodepool-hosted-1-lfx5h hosted hosted-worker2 agent://aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0203 Deleting 1h 4.13.0
-
In the output above, we can see the stuck Machine is
nodepool-hosted-1-lfx5h
, which is the one inDeleting
phase. -
Get the
AgentMachine
linked to the stuckMachine
.oc --kubeconfig ~/hypershift-lab/mgmt-kubeconfig -n hosted-hosted get machines.cluster.x-k8s.io \ nodepool-hosted-1-lfx5h -o jsonpath='{.spec.infrastructureRef.name}'
nodepool-hosted-1-q8kfx
-
Remove the finalizer from the
AgentMachine
(use the output name from the last command).oc --kubeconfig ~/hypershift-lab/mgmt-kubeconfig -n hosted-hosted patch agentmachine \ nodepool-hosted-1-q8kfx -p '{"metadata":{"finalizers":null}}' --type merge
agentmachine.capi-provider.agent-install.openshift.io/nodepool-hosted-1-q8kfx patched
-
The
Machine
should be gone now (this will also cause the stuck node to be deleted on the hosted cluster).oc --kubeconfig ~/hypershift-lab/mgmt-kubeconfig -n hosted-hosted get machines.cluster.x-k8s.io
NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION nodepool-hosted-1-hsmnq hosted hosted-worker1 agent://aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0202 Running 1h 4.13.0 nodepool-hosted-1-ktkmc hosted hosted-worker0 agent://aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0201 Running 1h 4.13.0