Load Intel Gaudi Driver Inside Running Docker Container

To prepare for loading the driver, make sure you have two terminal windows open, one for OCP host and one inside the running container. Follow the below steps:

  1. Inside the running docker container, download and install packages from the vault:

wget https://vault.habana.ai/artifactory/rhel/8/8.6/habanalabs-1.14.0-493.el8.noarch.rpm
wget https://vault.habana.ai/artifactory/rhel/8/8.6/habanalabs-firmware-1.14.0-493.el8.x86_64.rpm
wget https://vault.habana.ai/artifactory/rhel/8/8.6/habanalabs-firmware-tools-1.14.0-493.el8.x86_64.rpm
wget https://vault.habana.ai/artifactory/rhel/8/8.6/habanalabs-thunk-1.14.0-493.el8.x86_64.rpm
wget https://vault.habana.ai/artifactory/rhel/8/8.6/habanatools-1.14.0-493.el8.x86_64.rpm

dnf install -y ./habanalabs-firmware-1.14.0-493.el8.x86_64.rpm
dnf install -y ./habanalabs-thunk-1.14.0-493.el8.x86_64.rpm
dnf install -y ./habanalabs-firmware-tools-1.14.0-493.el8.x86_64.rpm
dnf install -y ./habanatools-1.14.0-493.el8.x86_64.rpm
dnf install -y ./habanalabs-1.14.0-493.el8.noarch.rpm
  1. Go to your OCP host and copy the above FW files from your running container to a proper place on OCP host.

    1. Obtain Docker ID of the running hl-rhel-coreos container:

    sudo podman ps -ap
    
    1. Copy FW files (3e297ed24b36 is a Docker container ID) on the OCP Host:

    sudo podman cp 3e297ed24b36:/lib/firmware/habanalabs/gaudi/gaudi-boot-fit.itb /opt/habana/habanalabs/gaudi/
    sudo podman cp 3e297ed24b36:/lib/firmware/habanalabs/gaudi/gaudi-fit.itb /opt/habana/habanalabs/gaudi/
    sudo podman cp 3e297ed24b36:/lib/firmware/habanalabs/gaudi/gaudi_tpc.bin /opt/habana/habanalabs/gaudi/
    sudo podman cp 3e297ed24b36:/usr/bin/hl-smi /opt/habana/habanalabs/gaudi/
    
  2. Inside the running docker container, load habanalabs driver manually:

    modprobe habanalabs_en && modprobe habanalabs
    

Note

The habanalabs driver is loaded inside the docker container and available on the OCP Host as long as the docker container is running. When the docker container is stopped or removed, habanlabs driver will not be available. To reload it, perform the docker related tasks again.

  1. Go to your OCP host and make sure the overlay mount set up in Preparation For Running Docker Image on OCP-based Host is working properly:

ls -lh /lib/firmware/habanalabs/gaudi/
-rwxr-xr-x. 1 root root 726K Dec 15 04:04 gaudi-boot-fit.itb
-rwxr-xr-x. 1 root root 9.9M Dec 15 04:04 gaudi-fit.itb
-rwxr-xr-x. 1 root root 1.5K Dec 15 04:02 gaudi_tpc.bin
-rwxr-xr-x. 1 root root 2.3M Dec 15 04:06 hl-smi
  1. Inside the running docker container, check that driver is loaded properly:

lsmod | grep habanalabs
habanalabs 966656 0
habanalabs_en 32768 1 habanalabs
  1. Go to your OCP host and check that driver is loaded properly:

/opt/habana/habanalabs/gaudi/hl-smi -d PRODUCT CLOCK -L
================ HL-SMI LOG ================

Timestamp                               : Tue Jan 25 17:57:58 UTC 2022

Driver Version                          : 1.14.0-9e8ecf8
HL-SMI Version                          : hl-1.14.0-fw-48.0.1.0 (Dec 15 2021 - 05:14:10)

Attached AIPs                           : 1

[0] AIP (hl0) 0000:03:00.0
        Product Name                    : HL-200
        Model Number                    : F08GL0AI2000A
        Serial Number                   : AK30011368
        Module ID                       : N/A
        PCB Assembly Version            : V0A
        PCB Version                     : R0E
        HL Revision                     : 2
        AIP UUID                        : 00P3-HL2000B0-14-P63M75-01-01-09
        Firmware [FIT] Version          : Linux gaudi 5.10.18-hl-gaudi-1.14.0-fw-48.0.1-sec-7 #1 SMP PREEMPT Mon Nov 8 09:54:45 IST 2021 aarch64 GNU/Linux
        Firmware [SPI] Version          : BTL version 2f4e4ab7,Preboot version hl-gaudi-1.14.0-fw-48.0.1-sec-7 (Nov 08 2021 - 09:57:21)
        Firmware [UBOOT] Version        : U-Boot 2021.04-hl-gaudi-1.14.0-fw-48.0.1-sec-7 (Nov 08 2021 - 09:54:16 +0200) build#: 1564
grep -E "$" /sys/class/habanalabs/hl?/*ver | cut -d / -f5-
hl0/armcp_kernel_ver:Linux gaudi 5.10.18-hl-gaudi-1.14.0-fw-48.0.1-sec-7 #1 SMP PREEMPT Mon Nov 8 09:54:45 IST 2021 aarch64 GNU/Linux
hl0/armcp_ver:armcpd version hl-gaudi-1.14.0-fw-48.0.1-sec-7 (Nov 08 2021 - 09:56:46)
hl0/cpld_ver:0x0000000f
hl0/cpucp_kernel_ver:Linux gaudi 5.10.18-hl-gaudi-1.14.0-fw-48.0.1-sec-7 #1 SMP PREEMPT Mon Nov 8 09:54:45 IST 2021 aarch64 GNU/Linux
hl0/cpucp_ver:armcpd version hl-gaudi-1.14.0-fw-48.0.1-sec-7 (Nov 08 2021 - 09:56:46)
hl0/driver_ver:1.14.0-124dd38
hl0/fuse_ver:00P3-HL2000B0-14-P63M75-01-01-09
hl0/infineon_ver:0x0002
hl0/preboot_btl_ver:BTL version 2f4e4ab7
hl0/preboot_btl_ver:Preboot version hl-gaudi-1.14.0-fw-48.0.1-sec-7 (Nov 08 2021 - 09:57:21)
hl0/thermal_ver:thermald version hl-gaudi-1.14.0-fw-48.0.1-sec-7 (Nov 08 2021 - 09:57:31)
hl0/uboot_ver:U-Boot 2021.04-hl-gaudi-1.14.0-fw-48.0.1-sec-7 (Nov 08 2021 - 09:54:16 +0200) build#: 1564