Load Intel Gaudi Driver Inside Running Docker Container

Make sure you have two terminal windows open, one for OCP host and one inside the running container, before loading the driver. Follow the below steps:

  1. Inside the running Docker container, download and install the below packages from the Intel® Gaudi® vault:

    wget https://vault.habana.ai/artifactory/rhel/8/8.6/habanalabs-1.14.0-493.el8.noarch.rpm
    wget https://vault.habana.ai/artifactory/rhel/8/8.6/habanalabs-firmware-1.14.0-493.el8.x86_64.rpm
    wget https://vault.habana.ai/artifactory/rhel/8/8.6/habanalabs-firmware-tools-1.14.0-493.el8.x86_64.rpm
    wget https://vault.habana.ai/artifactory/rhel/8/8.6/habanalabs-thunk-1.14.0-493.el8.x86_64.rpm
    wget https://vault.habana.ai/artifactory/rhel/8/8.6/habanatools-1.14.0-493.el8.x86_64.rpm
    
    dnf install -y ./habanalabs-firmware-1.14.0-493.el8.x86_64.rpm
    dnf install -y ./habanalabs-thunk-1.14.0-493.el8.x86_64.rpm
    dnf install -y ./habanalabs-firmware-tools-1.14.0-493.el8.x86_64.rpm
    dnf install -y ./habanatools-1.14.0-493.el8.x86_64.rpm
    dnf install -y ./habanalabs-1.14.0-493.el8.noarch.rpm
    
  2. Go to your OCP host and copy the above FW files from your running container to a proper place on OCP host:

    1. Obtain the Docker ID of the running hl-rhel-coreos container:

      sudo podman ps -ap
      
    2. Copy FW files (3e297ed24b36 is a Docker container ID) on the OCP Host:

      sudo podman cp 3e297ed24b36:/lib/firmware/habanalabs/gaudi/gaudi-boot-fit.itb /opt/habana/habanalabs/gaudi/
      sudo podman cp 3e297ed24b36:/lib/firmware/habanalabs/gaudi/gaudi-fit.itb /opt/habana/habanalabs/gaudi/
      sudo podman cp 3e297ed24b36:/lib/firmware/habanalabs/gaudi/gaudi_tpc.bin /opt/habana/habanalabs/gaudi/
      sudo podman cp 3e297ed24b36:/usr/bin/hl-smi /opt/habana/habanalabs/gaudi/
      
  3. Inside the running Docker container, load the habanalabs driver manually:

    modprobe habanalabs_en && modprobe habanalabs
    

    Note

    The habanalabs driver is loaded inside the Docker container and available on the OCP host as long as the Docker container is running. When the Docker container is stopped or removed, habanalabs driver will not be available. To reload it, perform the Docker-related tasks again.

  4. Go to your OCP host and make sure the overlay mount setup in Prepare Docker Image on OCP-based Host is working properly:

    ls -lh /lib/firmware/habanalabs/gaudi/
    

    Expected output:

    -rwxr-xr-x. 1 root root 726K Dec 15 04:04 gaudi-boot-fit.itb
    -rwxr-xr-x. 1 root root 9.9M Dec 15 04:04 gaudi-fit.itb
    -rwxr-xr-x. 1 root root 1.5K Dec 15 04:02 gaudi_tpc.bin
    -rwxr-xr-x. 1 root root 2.3M Dec 15 04:06 hl-smi
    
  5. Inside the running Docker container, check that driver is loaded properly:

    lsmod | grep habanalabs
    

    Expected output:

    habanalabs 966656 0
    habanalabs_en 32768 1 habanalabs
    
  6. Go to your OCP host and verify that the Intel Gaudi driver is loaded properly by running one of the following options:

  • Option 1

    Command:

    /opt/habana/habanalabs/gaudi/hl-smi -d PRODUCT CLOCK -L
    

    Output:

    ================ HL-SMI LOG ================
    
    Timestamp                               : Tue Jan 25 17:57:58 UTC 2022
    
    Driver Version                          : 1.14.0-9e8ecf8
    HL-SMI Version                          : hl-1.14.0-fw-48.0.1.0 (Dec 15 2021 - 05:14:10)
    
    Attached AIPs                           : 1
    
    [0] AIP (hl0) 0000:03:00.0
            Product Name                    : HL-200
            Model Number                    : F08GL0AI2000A
            Serial Number                   : AK30011368
            Module ID                       : N/A
            PCB Assembly Version            : V0A
            PCB Version                     : R0E
            HL Revision                     : 2
            AIP UUID                        : 00P3-HL2000B0-14-P63M75-01-01-09
            Firmware [FIT] Version          : Linux gaudi 5.10.18-hl-gaudi-1.14.0-fw-48.0.1-sec-7 #1 SMP PREEMPT Mon Nov 8 09:54:45 IST 2021 aarch64 GNU/Linux
            Firmware [SPI] Version          : BTL version 2f4e4ab7,Preboot version hl-gaudi-1.14.0-fw-48.0.1-sec-7 (Nov 08 2021 - 09:57:21)
            Firmware [UBOOT] Version        : U-Boot 2021.04-hl-gaudi-1.14.0-fw-48.0.1-sec-7 (Nov 08 2021 - 09:54:16 +0200) build#: 1564
    
  • Option 2

    Command:

    grep -E "$" /sys/class/habanalabs/hl?/*ver | cut -d / -f5-
    

    Output:

    hl0/armcp_kernel_ver:Linux gaudi 5.10.18-hl-gaudi-1.14.0-fw-48.0.1-sec-7 #1 SMP PREEMPT Mon Nov 8 09:54:45 IST 2021 aarch64 GNU/Linux
    hl0/armcp_ver:armcpd version hl-gaudi-1.14.0-fw-48.0.1-sec-7 (Nov 08 2021 - 09:56:46)
    hl0/cpld_ver:0x0000000f
    hl0/cpucp_kernel_ver:Linux gaudi 5.10.18-hl-gaudi-1.14.0-fw-48.0.1-sec-7 #1 SMP PREEMPT Mon Nov 8 09:54:45 IST 2021 aarch64 GNU/Linux
    hl0/cpucp_ver:armcpd version hl-gaudi-1.14.0-fw-48.0.1-sec-7 (Nov 08 2021 - 09:56:46)
    hl0/driver_ver:1.14.0-124dd38
    hl0/fuse_ver:00P3-HL2000B0-14-P63M75-01-01-09
    hl0/infineon_ver:0x0002
    hl0/preboot_btl_ver:BTL version 2f4e4ab7
    hl0/preboot_btl_ver:Preboot version hl-gaudi-1.14.0-fw-48.0.1-sec-7 (Nov 08 2021 - 09:57:21)
    hl0/thermal_ver:thermald version hl-gaudi-1.14.0-fw-48.0.1-sec-7 (Nov 08 2021 - 09:57:31)
    hl0/uboot_ver:U-Boot 2021.04-hl-gaudi-1.14.0-fw-48.0.1-sec-7 (Nov 08 2021 - 09:54:16 +0200) build#: 1564