Prometheus Metric Exporter for Kubernetes

This is a Kubernetes Prometheus exporter implementation that enables the collection of HABANA device metrics in a container cluster for compute workload. With the appropriate hardware and this plugin deployed in your Kubernetes cluster, you will be able to collect information regarding the state of a HABANA device. The HABANA Prometheus metric exporter for Kubernetes is a Daemonset.

Prerequisites

The list of prerequisites for running the HABANA Prometheus metric exporter is described below:

  • SynapseAI SW drivers loaded on the system

  • Kubernetes version >= 1.10

Create a Namespace

Create the habanalabs namespace if necessary as the metric exporter is deployed into the namespace:

kubectl create ns habanalabs

Deployment

Enabling HABANA Prometheus metric exporter support in Kubernetes.

The metric exporter needs to be run on all the nodes that are equipped with HABANA device. The simplest way of doing so is deploying the following Daemonset by using the kubectl apply command.

Note

kubectl must have access to a Kubernetes cluster to implement these commands.

$ kubectl create -f https://vault.habana.ai/gaudi-metric-exporter/metric-exporter.yaml

It is highly recommended that you deploy the Prometheus metric exporter along with kube-prometheus. If you are deploying in a cluster that uses kube-prometheus, you will want to deploy a Kubernetes Service and kube-prometheus ServiceMonitor to integrate the Prometheus metric exporter with kube-prometheus.

To install the Service and ServiceMonitor run the following commands:

$ kubectl create -f https://vault.habana.ai/gaudi-metric-exporter/metric-service.yaml
$ kubectl create -f https://vault.habana.ai/gaudi-metric-exporter/metric-service-monitor.yaml

Collecting Metrics

Now you can collect metrics on a node with HABANA devices by querying the endpoint of the metric exporter pod using port :41611 with the cluster. To find the end points associated with the metric exporter run the following command:

$ kubectl get ep -n habana-system

Once you have the associated end points for the metric exporter a simple command like the below will retrieve Prometheus metrics for all Habana devices on that node:

$ curl http://<endpoint_ip>:41611/metrics