Troubleshooting Tips
In this section, we will provide some useful tips to troubleshoot any issue that can arise during the execution of this lab.
Verification of the lab status
Containerized Infrastructure Services
There are several infrastructure services that run as containers such as the Git server or the container registry. Below the list of containers that should be running.
All these containers are managed via the following systemd services:
-
podman-gitea.service
-
podman-minio.service
-
podman-registry.service
-
podman-showroom-apache.service
-
podman-showroom-firefox.service
-
podman-showroom-traefik.service
-
podman-showroom-wetty.service
-
podman-webcache.service
sudo podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
22b7da48762d quay.io/alosadag/httpd:p8080 httpd-foreground 11 hours ago Up 11 hours webcache
3ca3e6d08b86 quay.io/mavazque/registry:2.7.1 /etc/docker/regis... 11 hours ago Up 11 hours registry
afa55e6fec53 quay.io/minio/minio:RELEASE.2025-02-07T23-21-09Z server /data --co... 11 hours ago Up 11 hours 0.0.0.0:9001->9001/tcp, 0.0.0.0:9002->9000/tcp minio-server
0a044c32bb1d quay.io/mavazque/gitea:1.17.3 /bin/s6-svscan /e... 11 hours ago Up 11 hours 0.0.0.0:2222->22/tcp, 0.0.0.0:3000->3000/tcp gitea
0fd211cc1bb8 quay.io/fedora/httpd-24-micro:2.4 /usr/bin/run-http... 11 hours ago Up 11 hours 192.168.125.1:8181->8080/tcp showroom-apache
337206f80582 quay.io/rhsysdeseng/showroom:wetty --ssh-user=lab-us... 11 hours ago Up 11 hours 192.168.125.1:3001->3000/tcp showroom-wetty
687cb4e59f6a quay.io/rhsysdeseng/showroom:traefik-v3.3.4 --configFile=/etc... 11 hours ago Up 11 hours 0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp, 0.0.0.0:3003->3003/tcp, 0.0.0.0:8888->8888/tcp showroom-traefik
a0cec2b7b429 quay.io/rhsysdeseng/showroom:webfirefox 11 hours ago Up 11 hours showroom-firefox
SNO Virtual Machines
In the lab we are going to provision and configure two SNO clusters named sno-abi and sno-ibi. Let’s double check that the virtual machines exist and are stopped.
kcli list vm
+----------------+--------+----------------+----------------------------------------------------+------+---------+
| Name | Status | Ip | Source | Plan | Profile |
+----------------+--------+----------------+----------------------------------------------------+------+---------+
| hub-ctlplane-0 | up | 192.168.125.20 | rhcos-418.94.202501221327-0-openstack.x86_64.qcow2 | hub | kvirt |
| hub-ctlplane-1 | up | 192.168.125.21 | rhcos-418.94.202501221327-0-openstack.x86_64.qcow2 | hub | kvirt |
| hub-ctlplane-2 | up | 192.168.125.22 | rhcos-418.94.202501221327-0-openstack.x86_64.qcow2 | hub | kvirt |
| sno-abi | down | | | hub | kvirt |
| sno-ibi | down | | | hub | kvirt |
| sno-seed | up | 192.168.125.30 | | hub | kvirt |
+----------------+--------+----------------+----------------------------------------------------+------+---------+
Hub cluster
Before working with oc commands you can enable command auto-completion by running:
source <(oc completion bash)
# Make it persistent
oc completion bash >> /etc/bash_completion.d/oc_completion
Check the status of the hub cluster.
oc --kubeconfig ~/hub-kubeconfig get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version v4.19.0 True False 22h Cluster version is v4.19.0
oc --kubeconfig ~/hub-kubeconfig get nodes
NAME STATUS ROLES AGE VERSION
hub-ctlplane-0.5g-deployment.lab Ready control-plane,master,worker 10h v1.32.5
hub-ctlplane-1.5g-deployment.lab Ready control-plane,master,worker 10h v1.32.5
hub-ctlplane-2.5g-deployment.lab Ready control-plane,master,worker 10h v1.32.5
oc --kubeconfig ~/hub-kubeconfig get operators
NAME AGE
advanced-cluster-management.open-cluster-management 10h
lvms-operator.openshift-storage 10h
multicluster-engine.multicluster-engine 10h
openshift-gitops-operator.openshift-operators 10h
topology-aware-lifecycle-manager.openshift-operators 10h
oc --kubeconfig ~/hub-kubeconfig get catalogsources -A
NAMESPACE NAME DISPLAY TYPE PUBLISHER AGE
openshift-marketplace cs-redhat-operator-index-v4-18-174293 grpc 10h
DNS resolution
Verify that the OpenShift API and the apps domain (wildcard) can be resolved.
dig command is not part of the standard linux utilities (you may need to install it), in RHEL-based systems is part of the bind-utils package.
|
dig +short api.hub.5g-deployment.lab
192.168.125.10
dig +short oauth-openshift.apps.hub.5g-deployment.lab
192.168.125.11
Policies not showing in the Governance console
In cases where the policies are not shown in the Governance section of the Multicloud console we have to check first, if the policies Argo application was synced successfully.
Verify that the policies in the hub cluster are similar to the ones shown below. Remember that inform as remediation is correct.
oc --kubeconfig ~/hub-kubeconfig get policies -A
NAMESPACE NAME REMEDIATION ACTION COMPLIANCE STATE AGE
sno-abi ztp-policies.common-config-policy inform Compliant 116m
sno-abi ztp-policies.common-subscriptions-policy inform Compliant 116m
sno-abi ztp-policies.du-sno-group-policy inform Compliant 116m
sno-abi ztp-policies.du-sno-sites-sites-policy inform Compliant 116m
sno-ibi ztp-policies.common-config-policy inform Compliant 92m
sno-ibi ztp-policies.common-subscriptions-policy inform Compliant 92m
sno-ibi ztp-policies.du-sno-group-policy inform Compliant 92m
sno-ibi ztp-policies.du-sno-sites-sites-policy inform Compliant 92m
ztp-policies common-config-policy inform Compliant 117m
ztp-policies common-subscriptions-policy inform Compliant 117m
ztp-policies common-test-config-policy inform 117m
ztp-policies common-test-subscriptions-policy inform 117m
ztp-policies du-sno-group-policy inform Compliant 117m
ztp-policies du-sno-sites-sites-policy inform Compliant 117m
ztp-policies du-sno-test-group-policy inform 117m
Policies not applied
In such cases it can be because of multiple errors. First, let’s check that the policies are shown in the Governance console.
If the policies show a warning message in the Cluster violations section, it is because the SNO servers are still being provisioned. You can double check the status of the provisioning in the Infrastructure → Clusters section. Verify that there is not ztp-running label added yet.
In cases where the Governance console shows policies already assigned to SNO clusters, we should check the status of the TALM operator. Remember, that it is responsible of moving the policies from inform to enforce, so they are eventually applied. Check the status of the cluster-group-upgrades-controller-manager Pod and its logs:
oc --kubeconfig ~/hub-kubeconfig get pods -n openshift-operators
NAME READY STATUS RESTARTS AGE
cluster-group-upgrades-controller-manager-v2-6fcb8695bf-bvngg 2/2 Running 0 10h
openshift-gitops-operator-controller-manager-65b8984b8c-2qp8j 2/2 Running 0 10h
Next, we can verify that a ClusterGroupUpgrade CR was created automatically by the TALM operator. If it is not created, it means that either the label is not set yet in the cluster or the operator is having issues. In the latest case, check the logs as explained previously.
oc --kubeconfig ~/hub-kubeconfig get cgu -A
NAMESPACE NAME AGE STATE DETAILS
ztp-install sno-abi 73m Completed All clusters are compliant with all the managed policies
ztp-install sno-ibi 66m Completed All clusters are compliant with all the managed policies
Describing the CGU shows a lot of information about the current status of the configuration:
oc --kubeconfig ~/hub-kubeconfig get cgu -n ztp-install sno-abi -o yaml
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
... REDACTED ...
status:
clusters:
- name: sno-abi
state: complete
computedMaxConcurrency: 1
conditions:
- lastTransitionTime: "2025-04-01T19:08:47Z"
message: All selected clusters are valid
reason: ClusterSelectionCompleted
status: "True"
type: ClustersSelected
- lastTransitionTime: "2025-04-01T19:08:47Z"
message: Completed validation
reason: ValidationCompleted
status: "True"
type: Validated
- lastTransitionTime: "2025-04-01T19:10:35Z"
message: All clusters are compliant with all the managed policies
reason: Completed
status: "False"
type: Progressing
- lastTransitionTime: "2025-04-01T19:10:35Z"
message: All clusters are compliant with all the managed policies
reason: Completed
status: "True"
type: Succeeded
managedPoliciesContent:
common-subscriptions-policy: '[{"kind":"Subscription","name":"lvms-operator","apiVersion":"operators.coreos.com/v1alpha1","namespace":"openshift-storage"},{"kind":"Subscription","name":"sriov-network-operator-subscription","apiVersion":"operators.coreos.com/v1alpha1","namespace":"openshift-sriov-network-operator"}]'
managedPoliciesForUpgrade:
- name: common-config-policy
namespace: ztp-policies
- name: common-subscriptions-policy
namespace: ztp-policies
- name: du-sno-group-policy
namespace: ztp-policies
- name: du-sno-sites-sites-policy
namespace: ztp-policies
managedPoliciesNs:
common-config-policy: ztp-policies
common-subscriptions-policy: ztp-policies
du-sno-group-policy: ztp-policies
du-sno-sites-sites-policy: ztp-policies
remediationPlan:
- - sno-abi
status:
completedAt: "2025-04-01T19:10:35Z"
startedAt: "2025-04-01T19:08:48Z"
Verfiy that now there are two times the number of policies in the hub cluster. That’s because a enforce copy of each one of them was created.
oc --kubeconfig ~/hub-kubeconfig get policies -A
NAMESPACE NAME REMEDIATION ACTION COMPLIANCE STATE AGE
sno-abi ztp-policies.common-config-policy inform Compliant 121m
sno-abi ztp-policies.common-subscriptions-policy inform Compliant 121m
sno-abi ztp-policies.du-sno-group-policy inform Compliant 121m
sno-abi ztp-policies.du-sno-sites-sites-policy inform Compliant 121m
sno-ibi ztp-policies.common-config-policy inform Compliant 97m
sno-ibi ztp-policies.common-subscriptions-policy inform Compliant 97m
sno-ibi ztp-policies.du-sno-group-policy inform Compliant 97m
sno-ibi ztp-policies.du-sno-sites-sites-policy inform Compliant 97m
ztp-policies common-config-policy inform Compliant 121m
ztp-policies common-subscriptions-policy inform Compliant 121m
ztp-policies common-test-config-policy inform 121m
ztp-policies common-test-subscriptions-policy inform 121m
ztp-policies du-sno-group-policy inform Compliant 121m
ztp-policies du-sno-sites-sites-policy inform Compliant 121m
ztp-policies du-sno-test-group-policy inform 121m
| Each enforce policy is being applied one by one. There can be cases where the Cluster violations or the Compliance status is not set for the enforced cluster. It takes time to move to the next one depending on the changes applied to the target cluster. |