System Verifications and Final Tests
On this Page
System Verifications and Final Tests¶
Driver Verification¶
Run
lsmod
to verify the driver is loaded and running:$ lsmod | grep habana habanalabs 1572864 0 habanalabs_cn 454656 8 habanalabs_ib 73728 8 habanalabs_en 61440 8
Run
hl-smi
. Verify that the driver version in thehl-smi
output matches the installed Intel Gaudi software version and that the temperature (“Temp” column in the output) reflects non-zero value. If the temperature output is “0C”, then there is a problem during the card initialization. In this case, reboot the system and/or verify that the driver installation steps were correct. The below image is an example hl-smi report output:
Run
dmesg
and ensure no errors are reported:$ dmesg | grep habana
Re-check all SW components by running the
apt list
command below. The following is an example output:$ apt list --installed | grep habana habanalabs-container-runtime/focal,now 1.18.0-524 amd64 [installed] habanalabs-dkms/focal,focal,now 1.18.0-524 all [installed] habanalabs-firmware-tools/focal,now 1.18.0-524 amd64 [installed] habanalabs-firmware/focal,now 1.18.0-524 amd64 [installed] habanalabs-graph/focal,now 1.18.0-524 amd64 [installed] habanalabs-qual/focal,now 1.18.0-524 amd64 [installed] habanalabs-thunk/focal,focal,now 1.18.0-524 all [installed] habanatools/focal,now 1.18.0-524 amd64 [installed]
Hardware Sanity Check¶
Run the hl_qual test suite for hardware sanity check. hl_qual test suite includes several tests which should be run on the system. See Qualification Tool Library Guide (hl_qual) for the exact procedures and the prerequisite steps. To confirm that all hardware components function and interact with each other, run the following test first:
cd /opt/habanalabs/qual/gaudi[1,2,3]/bin
$ ./hl_qual -gaudi[2,3] -c all -rmod parallel -f2 -l extreme -t 240
The below is an output example using Gaudi 3:
################################## Device measurment report #######################################
0000:0d:00.0 0000:0e:00.0 0000:0f:00.0 0000:08:00.0 0000:0c:00.0 0000:0b:00.0 0000:0a:00.0 0000:09:00.0
Min Temp: 42 38 39 42 41 48 38 42
Max Temp: 78 67 67 78 80 74 67 76
min Clock: 1600 1600 1600 1600 1600 1600 1600 1600
max Clock: 1600 1600 1600 1600 1600 1600 1600 1600
min Power: 213 211 207 213 207 213 208 211
max Power: 851 850 853 850 850 850 850 851
Sram Serr count: 0 0 0 0 0 0 0 0
Sram Derr count: 0 0 0 0 0 0 0 0
Dram Serr count: 0 0 0 0 0 0 0 0
Dram Derr count: 0 0 0 0 0 0 0 0
################################## Test result summary #######################################
hl_qual - device with busID: 0000:0d:00.0 result: PASSED
hl_qual - device with busID: 0000:0e:00.0 result: PASSED
hl_qual - device with busID: 0000:0f:00.0 result: PASSED
hl_qual - device with busID: 0000:08:00.0 result: PASSED
hl_qual - device with busID: 0000:0c:00.0 result: PASSED
hl_qual - device with busID: 0000:0b:00.0 result: PASSED
hl_qual - device with busID: 0000:0a:00.0 result: PASSED
hl_qual - device with busID: 0000:09:00.0 result: PASSED
################################## hl qual report #######################################
PASSED