Managing at Scale

In the previous section we described RHACM policies, in this section we are going to explain a bit more how we leverage some of their capabilities, plus other APIs to manage cluster configurations at scale.

Inform Policies

In RHACM as we saw in the previous section we can configure our policies in two modes, inform and enforce.

When using inform mode no changes will be done in the clusters targeted by the policy, whereas enforce mode will make the required changes to clusters targeted by the policy have the required configuration described in the policy.

In Edge deployments where potentially disruptive events must be carefully managed with respect to timing and rollout, we must have control over when changes to our clusters happen. With all Policies in inform mode we have visibility into the compliance of all clusters but the enforcement (changes to clusters) will occur only when the Policies are switched to enforce mode. On top of these policies, we have another component named Topology Aware Lifecycle Manager (TALM) which controls when a policy gets enforced based on a configuration created by the cluster admins. TALM will be introduced in a future section.

OLM Subscriptions

Additional functionality for OpenShift clusters can be provided through use of optional (day-two) operators installed on the cluster. These operators are installed and managed through use of an OLM Subscription CR. Operators are stored in a registry which is periodically updated with new versions of the operator. The Subscription CR may be set to automatically install new versions as they become available, or to a manual mode which requires approval before new versions are installed.

For the same reasons described earlier, the reference configurations for RAN 5G deployments sets Subscription CRs to manual mode to ensure that operator updates occur only when cluster administrators explicitly allow them. Similar to the way in which Policies can be scheduled for enforcement, TALM gives cluster administrators the tools to automatically approve operator updates on selected clusters at specified times.

Cluster Policies

RHACM Policies and bindings provide the tools to simplify management of a large fleet of clusters by allowing one Policy to define configuration across multiple target (spoke) clusters. In the GitOps ZTP flow, these policies are maintained in Git and synchronized to the hub cluster where RHACM makes use of them.

In most Edge deployments there is a large amount of configuration that is common to all clusters. The set of installed day-2 operators is one such example. These "common" configurations can be captured into a single policy that is then bound to all clusters in the fleet. This ensures that the clusters are consistently configured, simplifies management of the configuration, and improves the overall scalability of the solution.

In some cases, subsets of clusters may have significant commonality based on things like regionality, hardware type, etc. There may be a relatively small number of these "groups" of clusters within the fleet. By defining one policy per group, users can realize the same benefits of consistency, ease of management, etc. Some examples of policies that can fit this group will be: SRI-OV deployment on clusters with SRI-OV cards, accelerator operator deployment on clusters with accelerator cards.

Where clusters have configuration which is truly cluster specific (different on every cluster in the fleet) the user must provide unique content on a per-cluster basis. In a RAN deployment some examples of site specific configuration may include specific VLAN settings for SR-IOV virtual functions. There are multiple options for handling site specific configuration using Policy:

  • Site Specific Policies. Each site may have a Policy defined which maintains the unique CRs for that site.

  • Hub side templating. RHACM supports hub-side templating in which one Policy may be defined with one or more templates included. The site-specific values for these templates are pulled from a ConfigMap on the hub cluster prior to the Policy being evaluated for compliance and/or remediated.