Prometheus Metric Exporter for Kubernetes
On this Page
Prometheus Metric Exporter for Kubernetes¶
This is a Kubernetes Prometheus exporter implementation that enables the collection of HABANA device metrics in a container cluster for compute workload. With the appropriate hardware and this plugin deployed in your Kubernetes cluster, you will be able to collect information regarding the state of a HABANA device. The HABANA Prometheus metric exporter for Kubernetes is a Daemonset.
The list of prerequisites for running the HABANA Prometheus metric exporter is described below:
SynapseAI SW drivers loaded on the system
Kubernetes version >= 1.10
Create a Namespace¶
Create the habanalabs namespace if necessary as the metric exporter is deployed into the namespace:
kubectl create ns habanalabs
Enabling HABANA Prometheus metric exporter support in Kubernetes.
The metric exporter needs to be run on all the nodes that are equipped with HABANA device. The simplest way of doing so is deploying the following Daemonset by using the kubectl apply command.
kubectl must have access to a Kubernetes cluster to implement these commands.
$ kubectl create -f https://vault.habana.ai/gaudi-metric-exporter/metric-exporter.yaml
It is highly recommended that you deploy the Prometheus metric exporter along with kube-prometheus. If you are deploying in a cluster that uses kube-prometheus, you will want to deploy a Kubernetes Service and kube-prometheus ServiceMonitor to integrate the Prometheus metric exporter with kube-prometheus.
To install the Service and ServiceMonitor run the following commands:
$ kubectl create -f https://vault.habana.ai/gaudi-metric-exporter/metric-service.yaml
$ kubectl create -f https://vault.habana.ai/gaudi-metric-exporter/metric-service-monitor.yaml
Now you can collect metrics on a node with HABANA devices by querying the endpoint of the metric exporter pod using port :41611 with the cluster. To find the end points associated with the metric exporter run the following command:
$ kubectl get ep -n habana-system
Once you have the associated end points for the metric exporter a simple command like the below will retrieve Prometheus metrics for all Habana devices on that node:
$ curl http://<endpoint_ip>:41611/metrics