Deploying Intel Gaudi Base Operator

This section provides guidelines on how to install Intel Gaudi Base Operator on OpenShift and create a DeviceConfig instance. You can install Intel Gaudi Base Operator using the RedHat OpenShift console or CLI as described below.

Using RedHat OpenShift Console

  1. Go to Operators.

  2. Click “OperatorHub”.

  3. In All Items field, search for Habana AI.

  4. Click “Install”.

    ../../_images/Intel_Gaudi_Base_Operator_Installation.png

Using CLI

  1. Create habana-ai-operator-install.yaml file containing the following:

    ---
    apiVersion: v1
    kind: Namespace
    metadata:
       name: habana-ai-operator
    ---
    apiVersion: operators.coreos.com/v1
    kind: OperatorGroup
    metadata:
       name: habana-ai-operator
       namespace: habana-ai-operator
    spec:
       targetNamespaces:
       - habana-ai-operator
    ---
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
       name: habana-ai-operator
       namespace: habana-ai-operator
    spec:
       channel: stable
       installPlanApproval: Automatic
       name: habana-ai-operator
       source: certified-operators
       sourceNamespace: openshift-marketplace
    
  2. Apply the yaml file:

    oc apply -f habana-ai-operator-install.yaml
    

Creating the DeviceConfig Instance

The DeviceConfig is the main Custom Resource Definition (CRD) of the Intel Gaudi Base Operator.

The table below describes the required fields for creating the DeviceConfig instance:

Component

Field

Description

Scheme

Required

devicePlugin

image

The Intel Gaudi device plugin image to be used.

String

True

version

The Intel Gaudi device plugin version to be used.

String

True

driver

image

The Intel Gaudi driver image to be used.

String

True

version

The Intel Gaudi driver version to be used.

String

True

habanaRuntime

image

The Intel Gaudi container runtime image to be used.

String

True

version

The Intel Gaudi container runtime version to be used.

String

True

nodeMetrics

image

The Intel Gaudi node metrics image to be used.

String

True

version

The Intel Gaudi node metrics version to be used.

String

True

You can create the DeviceConfig instance using the RedHat OpenShift console or by using CLI as described below.

Using RedHat OpenShift Console

  1. Go to Operators.

  2. Click “Installed Operators”.

  3. In the Name field, define the instance as habana-ai.

  4. Under devicePlugin:

    1. In the image field, add the Intel Gaudi device plugin image to use:

      vault.habana.ai/docker-k8s-device-plugin/docker-k8s-device-plugin
      
    2. In the version field, define the Intel Gaudi device plugin version to use.

  5. Under driver:

    1. In the image field, add the Intel Gaudi driver image to use:

      image-registry.openshift-image-registry.svc:5000/habana-ai-operator/habana-ai-driver
      
    2. In the version field, define the Intel Gaudi driver version to use.

      Note

      If the image does not exist, it will be created and pushed to the specified image:version.

  6. Under habanaRuntime:

    1. In the image field, add the Intel Gaudi runtime image to use:

      vault.habana.ai/habana-ocp-operator/<Version>/habana-runtime
      
    2. In the version field, define the Intel Gaudi runtime version to use.

  7. Under nodeMetrics:

    1. In the image field, add the Intel Gaudi node metrics image to use:

      vault.habana.ai/gaudi-metric-exporter/metric-exporter
      
    2. In the version field, define the Intel Gaudi node metrics version to use.

      ../../_images/Create_Device_Config_Instance.png

Using CLI

  1. Create deviceconfig.yaml file containing the following:

       apiVersion: habana.ai/v1
       kind: DeviceConfig
       metadata:
          name: habana-ai
          namespace: habana-ai-operator
       spec:
          devicePlugin:
             image: vault.habana.ai/docker-k8s-device-plugin/docker-k8s-device-plugin
             version: 1.17.1
          driver:
             image: image-registry.openshift-image-registry.svc:5000/habana-ai-operator/habana-ai-driver
             version: 1.17.1-40
          habanaRuntime:
             image: vault.habana.ai/habana-ocp-operator/1.17.1/habana-runtime
             version: 1.17.1-40
          nodeMetrics:
             image: vault.habana.ai/gaudi-metric-exporter/metric-exporter
             version: 1.17.1-40
    

    The driver image is created inside the cluster itself and saved into Openshift’s internal registry - image-registry.openshift-image-registry.svc:5000. To load the image from another registry, replace the URL.

  2. Apply the yaml file:

    oc apply -f deviceconfig.yaml
    
  3. Apply the following patches to allow for the Image Registry Setup:

    oc patch configs.imageregistry.operator.openshift.io cluster --type merge --patch '{"spec":{"storage":{"emptyDir":{}}}}'
    oc patch configs.imageregistry.operator.openshift.io/cluster --patch '{"spec":{"managementState":"Managed"}}' --type=merge
    until oc get svc image-registry -n openshift-image-registry; do sleep 10; done
    oc patch configs.imageregistry.operator.openshift.io/cluster --patch '{"spec":{"managementState":"Unmanaged"}}' --type=merge