Scaling the Hosted Cluster

At this point we have a Hosted Cluster up and running. One common operation is the scaling in/out of nodes. In this section we will cover the two methods of scaling a Hosted Cluster: Manual and Automatic.

The scaling operations can be done from the Web Console or from the CLI. This section will only cover the CLI method as it’s more convenient than the Web Console.

Manually Scaling the Hosted Cluster

The scale operation can be done per NodePool, this means that you can scale different NodePools individually without impacting the others. When the number of replicas in the NodePool object is changed, the capi-provider component tries to locate an available agent and deploy it as an additional worker on the corresponding hosted cluster. In this first scenario we are going to add a third node to our cluster.

  1. Scale the existing NodePool from 2 replicas to 3 replicas.

    This operation is done from the management cluster. Users consuming the Hosted Cluster cannot add nodes by themselves.
    oc --kubeconfig ~/hypershift-lab/mgmt-kubeconfig -n hosted scale \
        nodepool nodepool-hosted-1 --replicas 3
    nodepool.hypershift.openshift.io/nodepool-hosted-1 scaled
  2. We should see the free agent being assigned to the Hosted Cluster now.

    oc --kubeconfig ~/hypershift-lab/mgmt-kubeconfig -n hardware-inventory get agents
    You can see the CLUSTER is set to hosted for the first agent. It can take a few minutes for the agent to be assigned to the Hosted Cluster after scaling the NodePool.
    NAME                                   CLUSTER   APPROVED   ROLE          STAGE
    aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0201   hosted    true       auto-assign
    aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0202   hosted    true       worker        Done
    aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0203   hosted    true       worker        Done
  3. And after a few moments, the installation should begin.

    oc --kubeconfig ~/hypershift-lab/mgmt-kubeconfig -n hardware-inventory get agents
    It can take up to 5 minutes for the installation to start.
    NAME                                   CLUSTER   APPROVED   ROLE     STAGE
    aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0201   hosted    true       worker   Writing image to disk
    aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0202   hosted    true       worker   Done
    aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0203   hosted    true       worker   Done
  4. Once finished, the agent will move to Done stage.

    oc --kubeconfig ~/hypershift-lab/mgmt-kubeconfig -n hardware-inventory get agents
    It can take up to 10 minutes for the installation to finish.
    NAME                                   CLUSTER   APPROVED   ROLE     STAGE
    aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0201   hosted    true       worker   Done
    aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0202   hosted    true       worker   Done
    aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0203   hosted    true       worker   Done
  5. If we check the Hosted Cluster nodes we will see a third one was added.

    oc --insecure-skip-tls-verify=true --kubeconfig ~/hypershift-lab/hosted-kubeconfig \
        get nodes
    NAME             STATUS   ROLES    AGE    VERSION
    hosted-worker0   Ready    worker   2m8s   v1.26.3+b404935
    hosted-worker1   Ready    worker   36m    v1.26.3+b404935
    hosted-worker2   Ready    worker   36m    v1.26.3+b404935
  6. Now that we have seen how to add a node, let’s see how to scale down the Hosted Cluster and thus removing a node. We can run the scale command again requesting 2 replicas.

    oc --kubeconfig ~/hypershift-lab/mgmt-kubeconfig -n hosted scale \
        nodepool nodepool-hosted-1 --replicas 2
    nodepool.hypershift.openshift.io/nodepool-hosted-1 scaled
  7. At this point one of the workers will be cordoned and workloads will be evicted.

    oc --insecure-skip-tls-verify=true --kubeconfig ~/hypershift-lab/hosted-kubeconfig \
        get nodes
    NAME             STATUS                     ROLES    AGE     VERSION
    hosted-worker0   Ready                      worker   2m29s   v1.26.3+b404935
    hosted-worker1   Ready                      worker   36m     v1.26.3+b404935
    hosted-worker2   Ready,SchedulingDisabled   worker   37m     v1.26.3+b404935
  8. After a few minutes, the node will be gone.

    oc --insecure-skip-tls-verify=true --kubeconfig ~/hypershift-lab/hosted-kubeconfig \
        get nodes
    It can take a while for the node to exit the cluster. It will very much depend on the workloads running on it. During our tests it took like 5 minutes.
    If the node being deleted ends up stuck in NotReady,SchedulingDisabled state, follow the instructions here.
    NAME             STATUS   ROLES    AGE     VERSION
    hosted-worker0   Ready    worker   3m26s   v1.26.3+b404935
    hosted-worker1   Ready    worker   37m     v1.26.3+b404935
  9. And the agent will be back on the pool so it can be reused later.

    oc --kubeconfig ~/hypershift-lab/mgmt-kubeconfig -n hardware-inventory get agent
    NAME                                   CLUSTER   APPROVED   ROLE          STAGE
    aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0201   hosted    true       worker        Done
    aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0202   hosted    true       worker        Done
    aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0203             true       auto-assign

Enabling Auto Scaling for the Hosted Cluster

In the previous section we have seen how we can add/remove capacity to/from a hosted cluster manually. This operation can be automated so when a hosted cluster requires more capacity a new node will be added to the cluster providing that there are spare agents to be provisioned. Let’s see how it works.

  1. We need to enable the auto-scaling for our NodePool, we’re setting a minimum of 2 nodes and a maximum of 3. This means that the hosted cluster will add 1 extra worker when cluster capacity is reached.

    The hosted cluster will be scaled down if the additional capacity has not been used for the past 10 minutes.
    oc --kubeconfig ~/hypershift-lab/mgmt-kubeconfig -n hosted patch \
        nodepool nodepool-hosted-1 --type=json \
        -p '[{"op": "remove", "path": "/spec/replicas"},{"op":"add", "path": "/spec/autoScaling", "value": { "max": 3, "min": 2 }}]'
    nodepool.hypershift.openshift.io/nodepool-hosted-1 patched
  2. At this point we need to generate that extra load. We have two workers with 12 vCPUs, that means that if we have workloads running on the cluster requesting more than 24 vCPUs, extra capacity should be added to the cluster. Let’s create such workload.

    We are requesting 3 replicas, each requesting 10 vCPUs. We need an extra node to accommodate the three replicas.
    oc --insecure-skip-tls-verify=true --kubeconfig ~/hypershift-lab/hosted-kubeconfig \
        -n default create deployment test-app --image=quay.io/mavazque/reversewords:latest
    oc --insecure-skip-tls-verify=true --kubeconfig ~/hypershift-lab/hosted-kubeconfig \
        -n default patch deployment test-app \
        -p '{"spec":{"template":{"spec":{"$setElementOrder/containers":[{"name":"reversewords"}],"containers":[{"name":"reversewords","resources":{"requests":{"cpu":10}}}]}}}}'
    deployment.apps/test-app created
    deployment.apps/test-app patched
  3. If we check the pods we will see that we only have one pods and it is in Running state, that’s because we have enough capacity in the hosted cluster to run the workload.

    oc --insecure-skip-tls-verify=true --kubeconfig ~/hypershift-lab/hosted-kubeconfig \
        -n default get pods
    NAME                       READY   STATUS    RESTARTS   AGE
    test-app-d97c4f77b-8kddp   1/1     Running   0          94s
  4. Now, let’s see what happens if we try to get three replicas of the app running.

    oc --insecure-skip-tls-verify=true --kubeconfig ~/hypershift-lab/hosted-kubeconfig \
        -n default scale deployment test-app --replicas 3
    deployment.apps/test-app scaled
  5. If we check the pods we will see that one of the pods is in Pending state, that’s because the current cluster with two workers cannot schedule the third pod due to insufficient vCPU capacity.

    oc --insecure-skip-tls-verify=true --kubeconfig ~/hypershift-lab/hosted-kubeconfig \
        -n default get pods
    NAME                       READY   STATUS    RESTARTS   AGE
    test-app-d97c4f77b-8kddp   1/1     Running   0          3m32s
    test-app-d97c4f77b-jhrd7   0/1     Pending   0          5s
    test-app-d97c4f77b-wbr8c   1/1     Running   0          5s
  6. At this point the NodePool will be scaled automatically, if we check the NodePool this is what we see.

    oc --kubeconfig ~/hypershift-lab/mgmt-kubeconfig -n hosted get nodepool nodepool-hosted-1
    Check the message. Scaling up MachineSet…​ It can take a few minutes for the autoscaling to start.
    NAME                CLUSTER   DESIRED NODES   CURRENT NODES   AUTOSCALING   AUTOREPAIR   VERSION   UPDATINGVERSION   UPDATINGCONFIG   MESSAGE
    nodepool-hosted-1   hosted                    2               True          False        4.13.0                                       Scaling up MachineSet to 3 replicas (actual 2)
  7. The spare agent is joining the cluster.

    oc --kubeconfig ~/hypershift-lab/mgmt-kubeconfig -n hardware-inventory get agent
    NAME                                   CLUSTER   APPROVED   ROLE     STAGE
    aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0201   hosted    true       worker   Done
    aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0202   hosted    true       worker   Done
    aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0203   hosted    true       worker   Writing image to disk
  8. Once the new node joins the cluster the workload will be running.

    oc --insecure-skip-tls-verify=true --kubeconfig ~/hypershift-lab/hosted-kubeconfig \
        get nodes
    It can take up to 10 minutes for the new node to join the cluster.
    NAME             STATUS   ROLES    AGE   VERSION
    hosted-worker0   Ready    worker   17m   v1.26.3+b404935
    hosted-worker1   Ready    worker   39m   v1.26.3+b404935
    hosted-worker2   Ready    worker   72s   v1.26.3+b404935
    oc --insecure-skip-tls-verify=true --kubeconfig ~/hypershift-lab/hosted-kubeconfig \
        -n default get pods
    NAME                       READY   STATUS    RESTARTS   AGE
    test-app-d97c4f77b-8kddp   1/1     Running   0          13m
    test-app-d97c4f77b-jhrd7   1/1     Running   0          10m
    test-app-d97c4f77b-wbr8c   1/1     Running   0          10m
  9. If we delete the workload, after 10 minutes the NodePool will be automatically scaled down again.

    oc --insecure-skip-tls-verify=true --kubeconfig ~/hypershift-lab/hosted-kubeconfig \
        -n default delete deployment test-app
    deployment.apps "test-app" deleted
  10. Once the NodePool gets scaled down, the hosted cluster will be back to two nodes.

    If the node being deleted ends up stuck in NotReady,SchedulingDisabled state, follow the instructions here.
    oc --insecure-skip-tls-verify=true --kubeconfig ~/hypershift-lab/hosted-kubeconfig \
        -n default get nodes
    It can take up to 5 minutes for the node to leave the cluster.
    NAME             STATUS                     ROLES    AGE   VERSION
    hosted-worker0   Ready                      worker   28m   v1.26.3+b404935
    hosted-worker1   Ready                      worker   50m   v1.26.3+b404935
  11. Finally, disable the auto-scaling.

    oc --kubeconfig ~/hypershift-lab/mgmt-kubeconfig -n hosted \
        patch nodepool nodepool-hosted-1 --type=json \
        -p '[{"op": "remove", "path": "/spec/autoScaling"},{"op":"add", "path": "/spec/replicas", "value": 2}]'

Fixing Stuck Deleted Node

DO NOT RUN THIS UNLESS YOU HIT THE NotReady,SchedulingDisabled NODE ISSUE!

Under certain conditions, the node leaving the cluster may end up in an inconsistent state that will prevent its deletion. With below steps we will fix the issue.

  1. Get the Machine that holds the stuck node.

    oc --kubeconfig ~/hypershift-lab/mgmt-kubeconfig -n hosted-hosted get machines.cluster.x-k8s.io
    NAME                      CLUSTER   NODENAME         PROVIDERID                                     PHASE      AGE   VERSION
    nodepool-hosted-1-hsmnq   hosted    hosted-worker1   agent://aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0202   Running    1h    4.13.0
    nodepool-hosted-1-ktkmc   hosted    hosted-worker0   agent://aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0201   Running    1h    4.13.0
    nodepool-hosted-1-lfx5h   hosted    hosted-worker2   agent://aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0203   Deleting   1h    4.13.0
  2. In the output above, we can see the stuck Machine is nodepool-hosted-1-lfx5h, which is the one in Deleting phase.

  3. Get the AgentMachine linked to the stuck Machine.

    oc --kubeconfig ~/hypershift-lab/mgmt-kubeconfig -n hosted-hosted get machines.cluster.x-k8s.io \
        nodepool-hosted-1-lfx5h -o jsonpath='{.spec.infrastructureRef.name}'
    nodepool-hosted-1-q8kfx
  4. Remove the finalizer from the AgentMachine (use the output name from the last command).

    oc --kubeconfig ~/hypershift-lab/mgmt-kubeconfig -n hosted-hosted patch agentmachine \
        nodepool-hosted-1-q8kfx -p '{"metadata":{"finalizers":null}}' --type merge
    agentmachine.capi-provider.agent-install.openshift.io/nodepool-hosted-1-q8kfx patched
  5. The Machine should be gone now (this will also cause the stuck node to be deleted on the hosted cluster).

    oc --kubeconfig ~/hypershift-lab/mgmt-kubeconfig -n hosted-hosted get machines.cluster.x-k8s.io
    NAME                      CLUSTER   NODENAME         PROVIDERID                                     PHASE      AGE   VERSION
    nodepool-hosted-1-hsmnq   hosted    hosted-worker1   agent://aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0202   Running    1h    4.13.0
    nodepool-hosted-1-ktkmc   hosted    hosted-worker0   agent://aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaa0201   Running    1h    4.13.0