troubleshooting

Playbook to replace bootstrap.kubeconfig and node certificates on OpenShift 3.10 3.11

If you are a serial upgrader like me, you may have found that at one point during your 3.10.xx patching (say 3.10.119) you hit this error during the data plane upgrade: TASK [openshift_node : Approve the node] ************************************************************ task path: /usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/upgrade/restart.yml:49 Using module file /usr/share/ansible/openshift-ansible/roles/lib_openshift/library/oc_csr_approve.py ... FAILED - RETRYING: Approve the node (30 retries left).Result was: { "all_subjects_found": [], "attempts": 1, "changed": false, "client_approve_results": [], "client_csrs": {}, "failed": true, "invocation": { "module_args": { "node_list": [ "ose-test-node-01.

Continue reading

OpenShift 3.6 Upgrade Metrics Fails Missing heapster-certs Secret

After your upgrade to OpenShift v3.6 did the deployment of cluster metrics wind up with empty graphs? Check if the heapster pod failed to start due to a missing secret called heapster-certs in the openshift-infra namespace. Problem Heapster pod is failing to start $ oc get pods NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-l1f3s 1/1 Running 0 9m hawkular-metrics-rdl07 1/1 Running 0 9m heapster-cfpcj 0/1 ContainerCreating 0 3m Check what volumes it is attempting to mount

Continue reading

OpenShift Cluster Metrics and Cassandra Troubleshooting

OpenShift gathers cluster metrics such as CPU, memory, and network bandwidth per pod which can assist in troubleshooting and capacity planning. The metrics are also used to support horizontal pod autoscaling, which makes the metrics service not just helpful, but critical to operation. Missing Liveness Probes There are 3 major components in the metrics collection process. Heapster gathers stats from Docker and feeds them to Hawkular Metrics to tuck away for safe keeping in Cassandra.

Continue reading