Intel Gaudi Device Plugin for Kubernetes

This is a Kubernetes device plugin implementation that enables the registration of the Gaudi device in a container cluster for compute workload. With the appropriate hardware and this plugin deployed in your Kubernetes cluster, you can run jobs on the Gaudi device.

The Intel Gaudi device plugin for Kubernetes is a DaemonSet that allows you to automatically:

  • Enable the registration of Gaudi devices in your Kubernetes cluster.

  • Keep track of device health.

Note

  • Make sure to review the supported Kubernetes versions listed in the Support Matrix.

  • Make sure Intel Gaudi software drivers are loaded on the system. To load the drivers, run:

    sudo modprobe habanalabs && sudo modprobe habanalabs_cn && sudo modprobe habanalabs_ib && sudo modprobe habanalabs_en
    

Deploying Intel Gaudi Device Plugin for Kubernetes

  1. Run the device plugin on all Gaudi nodes by deploying the DaemonSet using the kubectl create command. Use the habana-k8s-device-plugin.yaml file to set up the environment:

    kubectl create -f https://vault.habana.ai/artifactory/docker-k8s-device-plugin/habana-k8s-device-plugin.yaml
    

    Note

    kubectl requires access to a Kubernetes cluster to implement its commands. To check the access to kubectl command, run kubectl get pod -A.

  2. Check the device plugin deployment status by running the following command:

    kubectl get pods -n habana-system
    

    Expected result:

    NAME                                       READY   STATUS    RESTARTS   AGE
    habanalabs-device-plugin-daemonset-qtpnh   1/1     Running   0          2d11h