hl_smi_async Tool
On this Page
hl_smi_async Tool¶
hl_smi_async is a utility tool for monitoring and managing the Gaudi devices asynchronously.
Its basic functionality is a telemetry data collection and logging the collected data. hl_smi_async allows reading
the different telemetry packets and handling all the related configurations and handshakes. For more information
about the telemetry packets, see the Gaudi 3 In-Band Telemetry for Hypervisor document located in the Intel Gaudi vault and Intel RDC.
The hl_smi_async application is located in the hl-smi repository: hl-smi/app/hl-smi-async
.
Note
The tool can run with or without driver loaded.
It is only possible to run the tool on a bare-metal machine or a Hypervisor. Running the tool on a guest VM may cause undefined behavior.
Options and Usage¶
The following table lists the available hl_smi_async options and their usage to help you effectively configure the tool for your specific needs.
Example:
sudo /usr/sbin/hl-smi-async -D b1:00.0 -O console -L info -I 5
Option |
Description |
---|---|
|
Outputs the help message and exits. |
|
Outputs version information and exits. |
|
Specifies the PCIe address of the device (e.g. b1:00.0). |
|
Specifies the telemetry output type. Valid values:
|
|
Specifies the log level. Valid values:
|
|
Specifies the number of iterations. If not set (default), the tool runs in an endless loop. |
Output example:
Driver is not up (preboot mode):
Reading all synced telemetry packets Timestamp: 10943 ms, Packet: Temperature, Field: temperature.aip, Data: 38 Timestamp: 10861 ms, Packet: Power, Field: power.draw.54v, Data: 157 Timestamp: 10979 ms, Packet: Health, Field: health, Data: 1 Timestamp: 10861 ms, Packet: Perf, Field: uptime, Data: 10 Timestamp: 3858 ms, Packet: sys_stat, Field: ib_fw_update.stat, Data: 2 Timestamp: 3859 ms, Packet: Security, Field: security.hash_spi_code, Data: 4a ac 10 c9 8c 97 76 1f b0 b6 7d 04 7f 67 f9 e0 42 8c cc f3 b4 7a 22 05 74 07 23 f1 ee 14 31 43 cd 99 55 e3 05 9a 3e 3e 69 8c 0d 4d 1f 1b 0b f1 Timestamp: 3858 ms, Packet: Mem_utility, Field: utilization.memory, Data: 0
Driver is up:
Starting telemetry data collection Reading all synced telemetry packets Timestamp: 42164 ms, Packet: Temperature, Field: temperature.aip, Data: 44 Timestamp: 42420 ms, Packet: Power, Field: power.draw.54v, Data: 157 Timestamp: 42420 ms, Packet: Power, Field: power.draw.12v, Data: 14 Timestamp: 42026 ms, Packet: Health, Field: health, Data: 3 Timestamp: 42419 ms, Packet: Perf, Field: uptime, Data: 42 Timestamp: 919 ms, Packet: System Status, Field: ib_fw_update.stat, Data: 2 Timestamp: 919 ms, Packet: System Status, Field: ethernet_ports.state, Data: 16777215 Timestamp: 919 ms, Packet: Security, Field: security.hash_spi_code, Data: 5c 49 ed 19 05 1e d6 a6 03 81 13 d0 74 87 dc e4 90 6b bb 74 ee 06 35 69 f6 7a 69 06 eb 8a c8 a9 ab bd 0c 32 0e 59 1e 55 48 fe aa 8d 87 32 a6 e1 Timestamp: 919 ms, Packet: Security, Field: security.hash_boot_fit, Data: 0d 8d d3 02 3a ea d1 d1 88 21 03 0f 40 4d bf 98 a0 5c 52 b1 3e 6e 16 c4 16 78 0c c7 b1 95 8c 42 66 75 e0 39 9a df 4d fb 40 d8 f6 22 3b 11 d4 6c Timestamp: 5061 ms, Packet: Mem_utility, Field: utilization.memory, Data: 0 Reading all synced telemetry packets
Some fields are invalid when the driver is not up (preboot mode). For more information, refer to the Gaudi 3 In-Band Telemetry for Hypervisor document located in the Intel Gaudi vault and Intel RDC.
Validating FW Images Authenticity¶
This section explains how to validate the authenticity of firmware images using the hl-smi-async tool by comparing its output with the corresponding
SHA files included in the habanalabs-hypervisor-utils
package:
img-hash-gaudi3-boot-fit.sha384
img-hash-gaudi3-images-pointers.bin.be.sha384
After downloading and installing habanalabs-hypervisor-utils
package which includes hl-smi-async
tool, as described
in Installing Hypervisor Tools Package section, the SHA files will be located in /lib/firmware/habanalabs/gaudi3
.
To retrieve the hashes of the latest FW version, perform the following:
Upgrade FW version to the latest SPI flash version as described in Firmware Upgrade section.
Load the LKD driver on the VM.
Run the hl-smi-async utility on the hypervisor:
sudo /usr/sbin/hl-smi-async -D <pci_addr> -O console -L info -I 1
Using a command line utility (such as xxd), compare the outputs of the
security.hash_spi_code
andsecurity.hash_boot_fit
hashes with the output of the hl-smi-async tool.security.hash_spi_code
hash is the signature of the flash image, whilesecurity.hash_boot_fit
hash is the signature of the FW application (mgmt app) which is loaded directly into RAM. Thehash_spi_code
is accessible during both the preboot and FW application run stages, while thehash_boot_fit
is only available during running the FW application after the LKD driver has been loaded. Therefore, prior to running the LKD driver, only thehash_spi_code
is displayed, while thehash_boot_fit
is shown only after the LKD is running. See the examples below. The values displayed in the outputs vary depending on the release build number in use:In the file output:
$ xxd /lib/firmware/habanalabs/gaudi3/img-hash-gaudi3-images-pointers.bin.be.sha384 00000000: 5c49 ed19 051e d6a6 0381 13d0 7487 dce4 \I..........t... 00000010: 906b bb74 ee06 3569 f67a 6906 eb8a c8a9 .k.t..5i.zi..... 00000020: abbd 0c32 0e59 1e55 48fe aa8d 8732 a6e1 ...2.Y.UH....2..
Look for the following from hl-smi-async tool output:
Security, Field: security.hash_spi_code, Data: 5c 49 ed 19 05 1e d6 a6 03 81 13 d0 74 87 dc e4 90 6b bb 74 ee 06 35 69 f6 7a 69 06 eb 8a c8 a9 ab bd 0c 32 0e 59 1e 55 48 fe aa 8d 87 32 a6 e1
In the file output:
$ xxd /lib/firmware/habanalabs/gaudi3/img-hash-gaudi3-boot-fit.sha384 00000000: 0d8d d302 3aea d1d1 8821 030f 404d bf98 ....:....!..@M.. 00000010: a05c 52b1 3e6e 16c4 1678 0cc7 b195 8c42 .\R.>n...x.....B 00000020: 6675 e039 9adf 4dfb 40d8 f622 3b11 d46c fu.9..M.@..";..l
Look for the following from hl-smi-async tool output:
Security, Field: security.hash_boot_fit, Data: 0d 8d d3 02 3a ea d1 d1 88 21 03 0f 40 4d bf 98 a0 5c 52 b1 3e 6e 16 c4 16 78 0c c7 b1 95 8c 42 66 75 e0 39 9a df 4d fb 40 d8 f6 22 3b 11 d4 6c