Deploying Intel Gaudi Base Operator
On this Page
Deploying Intel Gaudi Base Operator¶
This section provides guidelines on how to install Intel Gaudi Base Operator on OpenShift and create a DeviceConfig instance. You can install Intel Gaudi Base Operator using the RedHat OpenShift console or CLI as described below.
Using RedHat OpenShift Console¶
Go to Operators.
Click “OperatorHub”.
In All Items field, search for Habana AI.
Click “Install”.
Using CLI¶
Create
habana-ai-operator-install.yaml
file containing the following:--- apiVersion: v1 kind: Namespace metadata: name: habana-ai-operator --- apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: habana-ai-operator namespace: habana-ai-operator spec: targetNamespaces: - habana-ai-operator --- apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: habana-ai-operator namespace: habana-ai-operator spec: channel: stable installPlanApproval: Automatic name: habana-ai-operator source: certified-operators sourceNamespace: openshift-marketplace
Apply the yaml file:
oc apply -f habana-ai-operator-install.yaml
Creating the DeviceConfig Instance¶
The DeviceConfig is the main Custom Resource Definition (CRD) of the Intel Gaudi Base Operator.
The table below describes the required fields for creating the DeviceConfig instance:
Component |
Field |
Description |
Scheme |
Required |
---|---|---|---|---|
devicePlugin |
image |
The Intel Gaudi device plugin image to be used. |
String |
True |
version |
The Intel Gaudi device plugin version to be used. |
String |
True |
|
driver |
image |
The Intel Gaudi driver image to be used. |
String |
True |
version |
The Intel Gaudi driver version to be used. |
String |
True |
|
habanaRuntime |
image |
The Intel Gaudi container runtime image to be used. |
String |
True |
version |
The Intel Gaudi container runtime version to be used. |
String |
True |
|
nodeMetrics |
image |
The Intel Gaudi node metrics image to be used. |
String |
True |
version |
The Intel Gaudi node metrics version to be used. |
String |
True |
You can create the DeviceConfig instance using the RedHat OpenShift console or by using CLI as described below.
Using RedHat OpenShift Console¶
Go to Operators.
Click “Installed Operators”.
In the Name field, define the instance as
habana-ai
.Under devicePlugin:
In the image field, add the Intel Gaudi device plugin image to use:
vault.habana.ai/docker-k8s-device-plugin/docker-k8s-device-plugin
In the version field, define the Intel Gaudi device plugin version to use.
Under driver:
In the image field, add the Intel Gaudi driver image to use:
image-registry.openshift-image-registry.svc:5000/habana-ai-operator/habana-ai-driver
In the version field, define the Intel Gaudi driver version to use.
Note
If the image does not exist, it will be created and pushed to the specified image:version.
Under habanaRuntime:
In the image field, add the Intel Gaudi runtime image to use:
vault.habana.ai/habana-ocp-operator/<Version>/habana-runtime
In the version field, define the Intel Gaudi runtime version to use.
Under nodeMetrics:
Using CLI¶
Create
deviceconfig.yaml
file containing the following:apiVersion: habana.ai/v1 kind: DeviceConfig metadata: name: habana-ai namespace: habana-ai-operator spec: devicePlugin: image: vault.habana.ai/docker-k8s-device-plugin/docker-k8s-device-plugin version: 1.17.1 driver: image: image-registry.openshift-image-registry.svc:5000/habana-ai-operator/habana-ai-driver version: 1.17.1-40 habanaRuntime: image: vault.habana.ai/habana-ocp-operator/1.17.1/habana-runtime version: 1.17.1-40 nodeMetrics: image: vault.habana.ai/gaudi-metric-exporter/metric-exporter version: 1.17.1-40
The driver image is created inside the cluster itself and saved into Openshift’s internal registry -
image-registry.openshift-image-registry.svc:5000
. To load the image from another registry, replace the URL.Apply the yaml file:
oc apply -f deviceconfig.yaml
Apply the following patches to allow for the Image Registry Setup:
oc patch configs.imageregistry.operator.openshift.io cluster --type merge --patch '{"spec":{"storage":{"emptyDir":{}}}}' oc patch configs.imageregistry.operator.openshift.io/cluster --patch '{"spec":{"managementState":"Managed"}}' --type=merge until oc get svc image-registry -n openshift-image-registry; do sleep 10; done oc patch configs.imageregistry.operator.openshift.io/cluster --patch '{"spec":{"managementState":"Unmanaged"}}' --type=merge