Something that I need to do with an ever increasing frequency is changing the way that the cluster autoscaler behaves on a running OpenShift cluster. While this operation can be sensitive (I don’t want to blow up my cluster!), there are some simple steps to take that allow me to adjust the logging output and even the commands which are passed to the autoscaler.
Before I can start messing with the deployments for various components I will need to scale down the Cluster Version Operator. For more information about why I’m doing this, see my previous post Tips for Experimenting with the OpenShift Machine API.
Scaling down the cluster version operator
oc scale --replicas=0 deploy/cluster-version-operator -n openshift-cluster-version
Additionally, I need to scale down the deployment for the Cluster Autoscaler Operator to ensure that it doesn’t try to correct the changes I am introducing.
Scaling down the cluster autoscaler operator
oc scale --replicas=0 deploy/cluster-autoscaler-operator -n openshift-machine-api
I also need to make sure that there are no autoscalers currently running.
Delete ClusterAutoscaler resource
oc delete ClusterAutoscaler default
Changing the Autoscaler
There are a couple tasks that I perform most commonly when hacking on the autoscaler: adjusting the log verbosity, and injecting extra command line parameters. Both of these can be controlled by adding environment variables to the deployment for the Cluster Autoscaler Operator (CAO).
With the CAO scaled down, I can edit its deployment to add a few values.
Edit the cluster autoscaler operator deployment
oc edit deploy/cluster-autoscaler-operator -n openshift-machine-api
I look in the
Deployment.spec.template.spec field to find the details for the
cluster-autoscaler-operator. This container has several
environment variables specified. It should look something like this:
cluster-autoscaler-operator deployment, environment variables
env: - name: RELEASE_VERSION value: "0.0.1-snapshot" - name: WATCH_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace - name: CLUSTER_AUTOSCALER_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace - name: LEADER_ELECTION_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace - name: CLUSTER_AUTOSCALER_IMAGE value: docker.io/openshift/origin-cluster-autoscaler:v4.0 - name: WEBHOOKS_CERT_DIR value: /etc/cluster-autoscaler-operator/tls - name: WEBHOOKS_PORT value: "8443" - name: METRICS_PORT value: "9191"
I want to add 2 values to this list to add my changes:
Add verbosity and extra args to cluster-autoscaler deployments
- name: CLUSTER_AUTOSCALER_VERBOSITY value: "4" - name: CLUSTER_AUTOSCALER_EXTRA_ARGS value: "--new-pod-scale-up-delay=10m"
In this example I am setting the verbosity to “4” (the maximum), and adding the command line argument “–new-pod-scale-up-delay=10m” to each deployment of the cluster autoscaler.
It’s worth noting that these changes affect the cluster autoscalers that
are deployed by the CAO. These values will affect any autoscalers that I
create with the
ClusterAutoscaler resource as described in the
Using the new options
Before I can see my new changes I will need to scale back up the CAO and then create a cluster autoscaler.
Scaling up the cluster autoscaler operator
oc scale --replicas=1 deploy/cluster-autoscaler-operator -n openshift-machine-api
I now need to create a cluster autoscaler. To test my changes I can use the a very basic ClusterAutoscaler:
--- apiVersion: "autoscaling.openshift.io/v1" kind: "ClusterAutoscaler" metadata: name: "default" spec: balanceSimilarNodeGroups: true ignoreDaemonsetsUtilization: false skipNodesWithLocalStorage: true podPriorityThreshold: -10 resourceLimits: maxNodesTotal: 24 cores: min: 8 max: 128 memory: min: 4 max: 256 scaleDown: enabled: true delayAfterAdd: 10s delayAfterDelete: 10s delayAfterFailure: 10s
Verifying the changes
To truly verify what I have done I will need to inspect the log output from the cluster autoscaler. To do this I need to find the pod deployment for the autoscaler and then follow its logs.
Find the cluster autoscaler
$ oc get pods -n openshift-machine-api NAME READY STATUS RESTARTS AGE cluster-autoscaler-default-58bb9c9f6d-s7nkl 1/1 Running 0 9s cluster-autoscaler-operator-759c6456cd-jnpcc 2/2 Running 0 2m50s machine-api-controllers-7dc56dc5f6-8vprl 7/7 Running 0 3h29m machine-api-operator-5dcb4496c4-wddp4 2/2 Running 0 3h30m
The cluster autoscaler will always be named “default”, so in the example above
I want to get the logs for
Getting the logs for the cluster autoscaler
oc logs -f cluster-autoscaler-default-58bb9c9f6d-s7nkl -n openshift-machine-api
At this point I can confirm that my command line argument was added and that the log verbosity is set how I want.
Hopefully these instructions have helped you get a start with the debugging options available in the cluster autoscaler. Good luck with your experiments and as always, happy hacking =)