Setting up OpenShift Environment

The below lists the dependencies required to install HabanaAI Operator on OpenShift:

Note

Before installing the below dependencies, make sure to review the supported OpenShift version listed in the Support Matrix.

Installing Kernel Module Management (KMM) Operator

The Kernel Module Management Operator is paired with the HabanaAI Operator to automate the management of all Intel Gaudi software components required for provisioning AI accelerators within the OpenShift cluster, including drivers and monitoring metrics.

You can install the KMM Operator either by using RedHat OpenShift Console or by using the CLI. Both methods are described below.

Using RedHat OpenShift Console

  1. Go to Operators.

  2. Click OperatorHub.

  3. In All Items field, search for Kernel Module Management.

  4. Click Install.

../../_images/KMM_Installation.png

Using the CLI

  1. Create kmm-install.yaml file containing the following:

apiVersion: v1
kind: Namespace
metadata:
  name: openshift-kmm
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  generateName: openshift-kmm-
  name: kernel-module-management
  namespace: openshift-kmm
spec:
  targetNamespaces:
  - openshift-kmm
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: kernel-module-management
  namespace: openshift-kmm
spec:
  channel: "stable"
  installPlanApproval: Automatic
  name: kernel-module-management
  source: redhat-operators
  sourceNamespace: openshift-marketplace
  1. Apply the yaml file.

oc apply -f kmm-install.yaml

Installing Intel Gaudi Firmware

To install Intel Gaudi firmware using the KMM operator, in the CLI add /var/lib/firmware in the firmware search path and run 99-worker-kernel-args-firmware-path.yaml. For further details, refer to the KMM GitHub Documentation.

Note

The name of the yaml file provided in this section, 99-worker-kernel-args-firmware-path.yaml, is given as an example only.

  1. Create the yaml file with the following content:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 99-worker-kernel-args-firmware-path
spec:
  config:
    ignition:
      version: 3.2.0
  kernelArguments:
    - 'firmware_class.path=/var/lib/firmware'
  1. Apply the yaml file.

oc apply -f 99-worker-kernel-args-firmware-path.yaml

Installing Node Feature Discovery (NFD) Operator - Optional

The Node Feature Discovery (NFD) operator scans the hardware of OpenShift nodes and assigns them with the appropriate labels, ensuring that drivers are exclusively installed on servers that necessitate them.

You can install the NFD Operator either by using RedHat OpenShift Console or by using the CLI. Both methods are described below.

Note

While installing the NFD Operator is not mandatory, it is still recommended to have it in place before installing HabanaAI Operator on OpenShift.

Using RedHat OpenShift Console

  1. Go to Operators.

  2. Click OperatorHub.

  3. In All Items field, search for Node Feature Discovery.

  4. Click Install.

../../_images/NFD_Installation.png
  1. Create an NFD instance and include habana.ai in the extraLabelNs list. This facilitates creating HabanaAI-specific labels, which consist of information about the card types inside each Gaudi node.

../../_images/Create_NFD_Instance.png

Using the CLI

  1. Create nfd-install.yaml file containing the following:

apiVersion: v1
kind: Namespace
metadata:
  name: openshift-nfd
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  generateName: openshift-nfd-
  name: openshift-nfd
  namespace: openshift-nfd
spec:
  targetNamespaces:
  - openshift-nfd
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: nfd
  namespace: openshift-nfd
spec:
  channel: "stable"
  installPlanApproval: Automatic
  name: nfd
  source: redhat-operators
  sourceNamespace: openshift-marketplace
  1. Apply the yaml file.

oc apply -f nfd-instance.yaml
  1. Create an NFD instance.

  1. Create nfd-instance.yaml file containing the following:

apiVersion: nfd.openshift.io/v1
kind: NodeFeatureDiscovery
metadata:
  name: nfd-instance
  namespace: openshift-nfd
spec:
  customConfig:
    configData: |
      #    - name: "more.kernel.features"
      #      matchOn:
      #      - loadedKMod: ["example_kmod3"]
      #    - name: "more.features.by.nodename"
      #      value: customValue
      #      matchOn:
      #      - nodename: ["special-.*-node-.*"]
  extraLabelNs:
    - habana.ai
  instance: ''
  operand:
    image: >-
      registry.redhat.io/openshift4/ose-node-feature-discovery@sha256:edd2adfdf423d6a1eb7e8c1e388d9cf5fbc829e7e66c7bc955e9b2a6f50d1a47
    servicePort: 12000
  topologyupdater: false
  workerConfig:
    configData: |
      core:
        sleepInterval: 60s
      sources:
        pci:
          deviceClassWhitelist:
            - "0200"
            - "03"
            - "12"
          deviceLabelFields:
            - "vendor"
  1. Apply the yaml file.

oc apply -f nfd-instance.yaml

If you would rather not install the NDF operator or add habana.ai to the extraLabelNs list, you can label the HabanaAI nodes as habana.ai/hpu.gaudi.present=true by executing the following command for each Gaudi node:

oc label node/<NODE> habana.ai/hpu.gaudi.present=true