Prometheus Metric Exporter
On this Page
Prometheus Metric Exporter¶
This is a Prometheus exporter implementation that enables the collection of Intel® Gaudi® AI accelerator metrics in a container cluster for compute workload. With the appropriate hardware and this plugin deployed in your cluster, you can collect information regarding the state of a Gaudi device.
Prerequisites¶
Intel Gaudi software drivers loaded on the system. For more details, refer to Installation Guide.
For Kubernetes only, the Kubernetes version listed in the Support Matrix.
Deploying Prometheus Metric Exporter in Docker¶
Start the container:
docker run -it --privileged --network=host -v /dev:/dev vault.habana.ai/gaudi-metric-exporter/metric-exporter:1.19.0 --port <PORT_NUMBER> (default port number 41611)
Define the Prometheus configuration file. Prometheus fundamentally stores all data as a time series: streams of timestamped values of the same metric and the same sets of labeled dimensions. The metric data exported from the exporter can be accessed in Prometheus for easier management. For details, refer to Prometheus documentation. For example:
- job_name: bmc scrape_interval: 30s # A 30s scrape interval is recommended metrics_path: /metrics # The exporter exposes its own metrics at /metrics static_configs: - targets: - 192.168.22.189 # Name of the server running the metric exporter relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: localhost:41611 # The location of the exporter to Prometheus
Deploying Prometheus Metric Exporter in Kubernetes¶
Create the
habanalabs
namespace if necessary as the metric exporter is deployed into the namespace:kubectl create ns habanalabs
Run the metric exporter on all the Gaudi nodes by deploying the following DaemonSet using the
kubectl create
command. Use the associated .yaml file to set up the environment:$ kubectl create -f https://vault.habana.ai/artifactory/gaudi-metric-exporter/yaml/1.19.0/metric-exporter-daemonset.yaml
Note
kubectl
requires access to a Kubernetes cluster to implement its commands. To check the access tokubectl
command, run$ kubectl get pod -A
.To enable the Prometheus metric exporter and kube-prometheus integration, install Kubernetes Service and kube-prometheus ServiceMonitor by running the following commands:
$ kubectl create -f https://vault.habana.ai/artifactory/gaudi-metric-exporter/yaml/1.19.0/metric-exporter-service.yaml
$ kubectl create -f https://vault.habana.ai/artifactory/gaudi-metric-exporter/yaml/1.19.0/metric-exporter-serviceMonitor.yaml
It is highly recommended to deploy the Prometheus metric exporter along with kube-prometheus.
Note
Prometheus metric exporter exposes metrics to Intel Gaudi network interfaces using hostNetwork: true
.
Collecting Metrics¶
Now you can collect metrics on a node with Gaudi cards by querying the endpoint of the metric exporter pod using port :41611 with the cluster by following the below:
To find the end points associated with the metric, use the
--port
flag (int) to set a different port for the application exporter:$ kubectl get ep -n habana-system
Once you have the associated end points for the metric exporter, run a simple command such as the below to retrieve Prometheus metrics for all Gaudi cards on that node:
$ curl http://<endpoint_ip>:41611/metrics