Deploy Hawkular Metrics in CDK 2.1 OpenShift 3.2

June 16, 2016

Update! I failed with CDK 2.0, but CDK 2.1 works with some fiddling.

In my last post I installed Red Hat Container Developer Kit to deploy OpenShift Enterprise using Vagrant. But now I want to add Hawkular Metrics to that deployment.

Deploy Metrics

Refer to the docs for deploying metrics in OSE.

OpenShift Metrics

Login to the vagrant CDK VM before continuing

$ cd ~/cdk/components/rhel/rhel-ose/
$ vagrant ssh

$ oc login
Authentication required for (openshift)
Username: admin
Password: admin
Login successful.

$ oc project openshift-infra

$ oc get sa
NAME                        SECRETS   AGE
build-controller            2         10m
builder                     2         10m
daemonset-controller        2         10m
default                     2         10m
deployer                    2         10m
deployment-controller       2         10m
gc-controller               2         10m
hpa-controller              2         10m
job-controller              2         10m
namespace-controller        2         10m
pv-binder-controller        2         10m
pv-provisioner-controller   2         10m
pv-recycler-controller      2         10m
replication-controller      2         10m

$ oc create -f - <<API
apiVersion: v1
kind: ServiceAccount
  name: metrics-deployer
  - name: metrics-deployer

$ oc secrets new metrics-deployer nothing=/dev/null

$ oadm policy add-role-to-user         edit           system:serviceaccount:openshift-infra:metrics-deployer
$ oadm policy add-cluster-role-to-user cluster-reader system:serviceaccount:openshift-infra:heapster

From your OSE server grab /usr/share/openshift/examples/infrastructure-templates/enterprise/metrics-deployer.yaml or from here

$ curl -O

$ oc process -f metrics-deployer.yaml \
             -v \
             -v USE_PERSISTENT_STORAGE=false \
             | oc create -f -

# be patient while the images are pulled and pods are started. this can take a long time.
$ oc get events --watch

You are probably doing to see cassandra errors, but continue to the next step to tell the master where to find the metrics.

Cassandra Errors

The events may eventually output something like this:

2016-07-04 21:10:03 -0400 EDT   2016-07-04 21:10:03 -0400 EDT   1         hawkular-cassandra-1-5kmz4   Pod       spec.containers{hawkular-cassandra-1}   Warning   Unhealthy   {kubelet rhel-cdk}   Readiness probe failed: cat: /etc/*.conf: No such file or directory
nodetool: Failed to connect to '' - ConnectException: 'Connection refused'.
Cassandra not in the up and normal state. Current state is
/opt/apache-cassandra/bin/ line 28: [: =: unary operator expected

Output a little more info on that pod:

[vagrant@rhel-cdk ~]$ oc describe pod hawkular-cassandra-1-5kmz4
Name:           hawkular-cassandra-1-5kmz4
Namespace:      openshift-infra
Node:           rhel-cdk/
Start Time:     Mon, 04 Jul 2016 20:56:41 -0400
Labels:         metrics-infra=hawkular-cassandra,name=hawkular-cassandra-1,type=hawkular-cassandra
Status:         Running
Controllers:    ReplicationController/hawkular-cassandra-1
    Container ID:       docker://a5716703d9b98a540255e3db8cb40f15b39127e47f5e1f5279a8f63e07e47903
    Image ID:           docker://2a27e048703696117de74856c629f6837399e621658ece4fc725e4fe8c54bbcd
    Ports:              9042/TCP, 9160/TCP, 7000/TCP, 7001/TCP
    QoS Tier:
      memory:           BestEffort
      cpu:              BestEffort
    State:              Running
      Started:          Mon, 04 Jul 2016 21:26:17 -0400
    Ready:              True
    Restart Count:      0
    Readiness:          exec [/opt/apache-cassandra/bin/] delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment Variables:
      POD_NAMESPACE:    openshift-infra (v1:metadata.namespace)
  Type          Status
  Ready         True
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Type:       Secret (a volume populated by a Secret)
    SecretName: hawkular-cassandra-secrets
    Type:       Secret (a volume populated by a Secret)
    SecretName: cassandra-token-a2muv
  FirstSeen     LastSeen        Count   From                    SubobjectPath                           Type            Reason          Message
  ---------     --------        -----   ----                    -------------                           --------        ------          -------
  32m           32m             1       {default-scheduler }                                            Normal          Scheduled       Successfully assigned hawkular-cassandra-1-5kmz4 to rhel-cdk
  32m           32m             1       {kubelet rhel-cdk}      spec.containers{hawkular-cassandra-1}   Normal          Pulling         pulling image "
  19m           19m             1       {kubelet rhel-cdk}      spec.containers{hawkular-cassandra-1}   Normal          Pulled          Successfully pulled image "
  19m           19m             1       {kubelet rhel-cdk}      spec.containers{hawkular-cassandra-1}   Normal          Created         Created container with docker id 06fd27c4047c
  19m           19m             1       {kubelet rhel-cdk}      spec.containers{hawkular-cassandra-1}   Normal          Started         Started container with docker id 06fd27c4047c
  19m           19m             1       {kubelet rhel-cdk}      spec.containers{hawkular-cassandra-1}   Warning         Unhealthy       Readiness probe failed: cat: /etc/*.conf: No such
 file or directory
nodetool: Failed to connect to '' - ConnectException: 'Connection refused'.
Cassandra not in the up and normal state. Current state is
/opt/apache-cassandra/bin/ line 28: [: =: unary operator expected

  2m    2m      1       {kubelet rhel-cdk}      spec.containers{hawkular-cassandra-1}   Normal  Pulled          Container image "" alread
y present on machine
  2m    2m      1       {kubelet rhel-cdk}      spec.containers{hawkular-cassandra-1}   Normal  Created         Created container with docker id a5716703d9b9
  2m    2m      1       {kubelet rhel-cdk}      spec.containers{hawkular-cassandra-1}   Normal  Started         Started container with docker id a5716703d9b9
  2m    2m      2       {kubelet rhel-cdk}      spec.containers{hawkular-cassandra-1}   Warning Unhealthy       Readiness probe failed: cat: /etc/*.conf: No such file or directory
nodetool: Failed to connect to '' - ConnectException: 'Connection refused'.
Cassandra not in the up and normal state. Current state is
/opt/apache-cassandra/bin/ line 28: [: =: unary operator expected

There is a problem with the readiness check of the cassandra pod. Using this commit make a change to the running pod. Basically, on line 28, change $STATUS to ${STATUS}.

Now, even though there is the error catting a non-existant file, the script will not error out:

oc rsh hawkular-cassandra-1-5kmz4
sh-4.2$ /opt/apache-cassandra/bin/
cat: /etc/*.conf: No such file or directory
Cassandra is in the up and normal state. It is now ready

Update OpenShift Master Config

Openshift its self is running in a container called openshift. The config dir is mounted from the CDK VM at /var/lib/openshift/openshift.local.config/master

sudo vi /var/lib/openshift/openshift.local.config/master/master-config.yaml

# Add this:
  metricsPublicURL: ""

Login to the openshift container and HUP it. There is probably a better way to do this. In CDK 2.1 this killed OpenShift. Instead do a sudo systemctl restart openshift from the VM

# from host
vagrant ssh
# from VM
docker exec -ti openshift bash
ps -ef | grep openshift
kill -HUP <pid of openshift container>

Finish Up

Visit to confirm it is running, and accept the SSL certificate.

Again, be patient. There are several docker pulls going on which take quite some time.

Do Over

If you need to just clear the decks and start over, do this then go back to top.

oc delete all --selector="metrics-infra"
oc delete templates --selector="metrics-infra"
oc delete secrets --selector="metrics-infra"
oc delete pvc --selector="metrics-infra"
oc delete sa --selector="metrics-infra"

oc secrets new metrics-deployer nothing=/dev/null
