Prometheus Metric Exporter for Kubernetes
On this Page
Prometheus Metric Exporter for Kubernetes¶
This is a Kubernetes Prometheus exporter implementation that enables the collection of Intel® Gaudi® AI accelerator metrics in a container cluster for compute workload. With the appropriate hardware and this plugin deployed in your Kubernetes cluster, you will be able to collect information regarding the state of a Gaudi device. The Intel Gaudi Prometheus metric exporter for Kubernetes is a Daemonset.
Prerequisites¶
The list of prerequisites for running the Intel Gaudi Prometheus metric exporter is described below:
Intel Gaudi software drivers loaded on the system
Kubernetes version listed in the Support Matrix
Create a Namespace¶
Create the habanalabs
namespace if necessary as the metric exporter is deployed into the namespace:
kubectl create ns habanalabs
Deployment¶
Enabling Intel Gaudi Prometheus metric exporter support in Kubernetes.
The metric exporter needs to be run on all the nodes that are equipped with Gaudi cards. The simplest way of doing so is deploying the following Daemonset by using the kubectl apply command.
Note
kubectl must have access to a Kubernetes cluster to implement these commands.
$ kubectl create -f https://vault.habana.ai/artifactory/gaudi-metric-exporter/yaml/1.15.1/metric-exporter-daemonset.yaml
It is highly recommended that you deploy the Prometheus metric exporter along with kube-prometheus. If you are deploying in a cluster that uses kube-prometheus, you will want to deploy a Kubernetes Service and kube-prometheus ServiceMonitor to integrate the Prometheus metric exporter with kube-prometheus.
To install the Service and ServiceMonitor run the following commands:
$ kubectl create -f https://vault.habana.ai/artifactory/gaudi-metric-exporter/yaml/1.15.1/metric-exporter-service.yaml
$ kubectl create -f https://vault.habana.ai/artifactory/gaudi-metric-exporter/yaml/1.15.1/metric-exporter-serviceMonitor.yaml
Collecting Metrics¶
Now you can collect metrics on a node with Gaudi cards by querying the endpoint of the metric exporter pod using port :41611 with the cluster. To find the end points associated with the metric you can use the –port flag (int) to set a different port for the application exporter. Run the following command:
$ kubectl get ep -n habana-system
Once you have the associated end points for the metric exporter a simple command like the below will retrieve Prometheus metrics for all Gaudi cards on that node:
$ curl http://<endpoint_ip>:41611/metrics