System Verifications and Final Tests

Driver Verification

  1. Run lsmod to verify the driver is loaded and running:

    $ lsmod | grep habana
    habanalabs           1572864  0
    habanalabs_cn         454656  8
    habanalabs_ib          73728  8
    habanalabs_en          61440  8
    
  2. Run hl-smi. Verify that the driver version in the hl-smi output matches the installed Intel Gaudi software version and that the temperature (“Temp” column in the output) reflects non-zero value. If the temperature output is “0C”, then there is a problem during the card initialization. In this case, reboot the system and/or verify that the driver installation steps were correct. The below image is an example hl-smi report output:

../_images/hl_smi_report.png
  1. Run dmesg and ensure no errors are reported:

    $ dmesg | grep habana
    
  2. Re-check all SW components by running the apt list command below. The following is an example output:

         $ apt list --installed | grep habana
         habanalabs-container-runtime/focal,now 1.19.0-561 amd64 [installed]
         habanalabs-dkms/focal,focal,now 1.19.0-561 all [installed]
         habanalabs-firmware-tools/focal,now 1.19.0-561 amd64 [installed]
         habanalabs-firmware/focal,now 1.19.0-561 amd64 [installed]
         habanalabs-graph/focal,now 1.19.0-561 amd64 [installed]
         habanalabs-qual/focal,now 1.19.0-561 amd64 [installed]
         habanalabs-thunk/focal,focal,now 1.19.0-561 all [installed]
         habanatools/focal,now 1.19.0-561 amd64 [installed]
    

Hardware Sanity Check

Run the hl_qual test suite for hardware sanity check. hl_qual test suite includes several tests which should be run on the system. See Qualification Tool Library Guide (hl_qual) for the exact procedures and the prerequisite steps. To confirm that all hardware components function and interact with each other, run the following test first:

cd /opt/habanalabs/qual/gaudi[1,2,3]/bin
$ ./hl_qual -gaudi[2,3] -c all -rmod parallel -f2 -l extreme -t 240

The below is an output example using Gaudi 3:

##################################   Device measurment report #######################################

                0000:0d:00.0 0000:0e:00.0 0000:0f:00.0 0000:08:00.0 0000:0c:00.0 0000:0b:00.0 0000:0a:00.0 0000:09:00.0
Min Temp:                    42           38           39           42           41           48           38           42
Max Temp:                    78           67           67           78           80           74           67           76
min Clock:                 1600         1600         1600         1600         1600         1600         1600         1600
max Clock:                 1600         1600         1600         1600         1600         1600         1600         1600
min Power:                  213          211          207          213          207          213          208          211
max Power:                  851          850          853          850          850          850          850          851
Sram Serr count:              0            0            0            0            0            0            0            0
Sram Derr count:              0            0            0            0            0            0            0            0
Dram Serr count:              0            0            0            0            0            0            0            0
Dram Derr count:              0            0            0            0            0            0            0            0
##################################   Test result summary      #######################################
hl_qual - device with busID: 0000:0d:00.0 result: PASSED
hl_qual - device with busID: 0000:0e:00.0 result: PASSED
hl_qual - device with busID: 0000:0f:00.0 result: PASSED
hl_qual - device with busID: 0000:08:00.0 result: PASSED
hl_qual - device with busID: 0000:0c:00.0 result: PASSED
hl_qual - device with busID: 0000:0b:00.0 result: PASSED
hl_qual - device with busID: 0000:0a:00.0 result: PASSED
hl_qual - device with busID: 0000:09:00.0 result: PASSED
##################################   hl qual report #######################################
PASSED