Load Intel Gaudi Driver Inside Running Docker Container
Load Intel Gaudi Driver Inside Running Docker Container¶
Make sure you have two terminal windows open, one for OCP host and one inside the running container, before loading the driver. Follow the below steps:
Inside the running Docker container, download and install the below packages from the Intel® Gaudi® vault:
wget https://vault.habana.ai/artifactory/rhel/8/8.6/habanalabs-1.14.0-493.el8.noarch.rpm wget https://vault.habana.ai/artifactory/rhel/8/8.6/habanalabs-firmware-1.14.0-493.el8.x86_64.rpm wget https://vault.habana.ai/artifactory/rhel/8/8.6/habanalabs-firmware-tools-1.14.0-493.el8.x86_64.rpm wget https://vault.habana.ai/artifactory/rhel/8/8.6/habanalabs-thunk-1.14.0-493.el8.x86_64.rpm wget https://vault.habana.ai/artifactory/rhel/8/8.6/habanatools-1.14.0-493.el8.x86_64.rpm dnf install -y ./habanalabs-firmware-1.14.0-493.el8.x86_64.rpm dnf install -y ./habanalabs-thunk-1.14.0-493.el8.x86_64.rpm dnf install -y ./habanalabs-firmware-tools-1.14.0-493.el8.x86_64.rpm dnf install -y ./habanatools-1.14.0-493.el8.x86_64.rpm dnf install -y ./habanalabs-1.14.0-493.el8.noarch.rpm
Go to your OCP host and copy the above FW files from your running container to a proper place on OCP host:
Obtain the Docker ID of the running
hl-rhel-coreos
container:sudo podman ps -ap
Copy FW files (3e297ed24b36 is a Docker container ID) on the OCP Host:
sudo podman cp 3e297ed24b36:/lib/firmware/habanalabs/gaudi/gaudi-boot-fit.itb /opt/habana/habanalabs/gaudi/ sudo podman cp 3e297ed24b36:/lib/firmware/habanalabs/gaudi/gaudi-fit.itb /opt/habana/habanalabs/gaudi/ sudo podman cp 3e297ed24b36:/lib/firmware/habanalabs/gaudi/gaudi_tpc.bin /opt/habana/habanalabs/gaudi/ sudo podman cp 3e297ed24b36:/usr/bin/hl-smi /opt/habana/habanalabs/gaudi/
Inside the running Docker container, load the
habanalabs
driver manually:modprobe habanalabs_en && modprobe habanalabs
Note
The
habanalabs
driver is loaded inside the Docker container and available on the OCP host as long as the Docker container is running. When the Docker container is stopped or removed,habanalabs
driver will not be available. To reload it, perform the Docker-related tasks again.Go to your OCP host and make sure the overlay mount setup in Prepare Docker Image on OCP-based Host is working properly:
ls -lh /lib/firmware/habanalabs/gaudi/
Expected output:
-rwxr-xr-x. 1 root root 726K Dec 15 04:04 gaudi-boot-fit.itb -rwxr-xr-x. 1 root root 9.9M Dec 15 04:04 gaudi-fit.itb -rwxr-xr-x. 1 root root 1.5K Dec 15 04:02 gaudi_tpc.bin -rwxr-xr-x. 1 root root 2.3M Dec 15 04:06 hl-smi
Inside the running Docker container, check that driver is loaded properly:
lsmod | grep habanalabs
Expected output:
habanalabs 966656 0 habanalabs_en 32768 1 habanalabs
Go to your OCP host and verify that the Intel Gaudi driver is loaded properly by running one of the following options:
Option 1
Command:
/opt/habana/habanalabs/gaudi/hl-smi -d PRODUCT CLOCK -L
Output:
================ HL-SMI LOG ================ Timestamp : Tue Jan 25 17:57:58 UTC 2022 Driver Version : 1.14.0-9e8ecf8 HL-SMI Version : hl-1.14.0-fw-48.0.1.0 (Dec 15 2021 - 05:14:10) Attached AIPs : 1 [0] AIP (hl0) 0000:03:00.0 Product Name : HL-200 Model Number : F08GL0AI2000A Serial Number : AK30011368 Module ID : N/A PCB Assembly Version : V0A PCB Version : R0E HL Revision : 2 AIP UUID : 00P3-HL2000B0-14-P63M75-01-01-09 Firmware [FIT] Version : Linux gaudi 5.10.18-hl-gaudi-1.14.0-fw-48.0.1-sec-7 #1 SMP PREEMPT Mon Nov 8 09:54:45 IST 2021 aarch64 GNU/Linux Firmware [SPI] Version : BTL version 2f4e4ab7,Preboot version hl-gaudi-1.14.0-fw-48.0.1-sec-7 (Nov 08 2021 - 09:57:21) Firmware [UBOOT] Version : U-Boot 2021.04-hl-gaudi-1.14.0-fw-48.0.1-sec-7 (Nov 08 2021 - 09:54:16 +0200) build#: 1564
Option 2
Command:
grep -E "$" /sys/class/habanalabs/hl?/*ver | cut -d / -f5-
Output:
hl0/armcp_kernel_ver:Linux gaudi 5.10.18-hl-gaudi-1.14.0-fw-48.0.1-sec-7 #1 SMP PREEMPT Mon Nov 8 09:54:45 IST 2021 aarch64 GNU/Linux hl0/armcp_ver:armcpd version hl-gaudi-1.14.0-fw-48.0.1-sec-7 (Nov 08 2021 - 09:56:46) hl0/cpld_ver:0x0000000f hl0/cpucp_kernel_ver:Linux gaudi 5.10.18-hl-gaudi-1.14.0-fw-48.0.1-sec-7 #1 SMP PREEMPT Mon Nov 8 09:54:45 IST 2021 aarch64 GNU/Linux hl0/cpucp_ver:armcpd version hl-gaudi-1.14.0-fw-48.0.1-sec-7 (Nov 08 2021 - 09:56:46) hl0/driver_ver:1.14.0-124dd38 hl0/fuse_ver:00P3-HL2000B0-14-P63M75-01-01-09 hl0/infineon_ver:0x0002 hl0/preboot_btl_ver:BTL version 2f4e4ab7 hl0/preboot_btl_ver:Preboot version hl-gaudi-1.14.0-fw-48.0.1-sec-7 (Nov 08 2021 - 09:57:21) hl0/thermal_ver:thermald version hl-gaudi-1.14.0-fw-48.0.1-sec-7 (Nov 08 2021 - 09:57:31) hl0/uboot_ver:U-Boot 2021.04-hl-gaudi-1.14.0-fw-48.0.1-sec-7 (Nov 08 2021 - 09:54:16 +0200) build#: 1564