hl_smi_async Tool
On this Page
hl_smi_async Tool¶
hl_smi_async is a utility tool for monitoring and managing the Gaudi devices asynchronously.
Its basic functionality is a telemetry data collection and logging the collected data. hl_smi_async allows reading
the different telemetry packets and handling all the related configurations and handshakes. For more information
about the telemetry packets, see the Gaudi 3 In-Band Telemetry for Hypervisor document located in the Intel Gaudi vault.
The hl_smi_async application is located in the hl-smi repository: hl-smi/app/hl-smi-async
.
Note
The tool can run with or without driver loaded.
It is only possible to run the tool on a bare-metal machine or a Hypervisor. Running the tool on a guest VM may cause undefined behavior.
Options and Usage¶
The following table lists the available hl_smi_async options and their usage to help you effectively configure the tool for your specific needs.
Example:
sudo /usr/sbin/hl-smi-async -D b1:00.0 -O console -L info
Option |
Description |
---|---|
|
Outputs the help message and exits. |
|
Outputs version information and exits. |
|
Specifies the PCIe address of the device (e.g. b1:00.0). |
|
Specifies the telemetry output type. Valid values:
|
|
Specifies the log level. Valid values:
|
Output example:
Driver is not up (preboot mode):
Reading all synced telemetry packets Timestamp: 10943 ms, Packet: Temperature, Field: temperature.aip, Data: 38 Timestamp: 10861 ms, Packet: Power, Field: power.draw.54v, Data: 157 Timestamp: 10979 ms, Packet: Health, Field: health, Data: 1 Timestamp: 10861 ms, Packet: Perf, Field: uptime, Data: 10 Timestamp: 3858 ms, Packet: sys_stat, Field: ib_fw_update.stat, Data: 2 Timestamp: 3859 ms, Packet: Security, Field: security.hash_spi_code, Data: 4a ac 10 c9 8c 97 76 1f b0 b6 7d 04 7f 67 f9 e0 42 8c cc f3 b4 7a 22 05 74 07 23 f1 ee 14 31 43 cd 99 55 e3 05 9a 3e 3e 69 8c 0d 4d 1f 1b 0b f1 Timestamp: 3858 ms, Packet: Mem_utility, Field: utilization.memory, Data: 0
Driver is up:
Reading all synced telemetry packets Timestamp: 80262313 ms, Packet: Temperature, Field: temperature.aip, Data: 37 Timestamp: 80262754 ms, Packet: Power, Field: power.draw.54v, Data: 179 Timestamp: 80262754 ms, Packet: Power, Field: power.draw.12v, Data: 14 Timestamp: 80262561 ms, Packet: Health, Field: health, Data: 2 Timestamp: 80262754 ms, Packet: Perf, Field: uptime, Data: 80262 Timestamp: 926 ms, Packet: sys_stat, Field: ib_fw_update.stat, Data: 2 Timestamp: 926 ms, Packet: Security, Field: security.hash_spi_code, Data: 4a ac 10 c9 8c 97 76 1f b0 b6 7d 04 7f 67 f9 e0 42 8c cc f3 b4 7a 22 05 74 07 23 f1 ee 14 31 43 cd 99 55 e3 05 9a 3e 3e 69 8c 0d 4d 1f 1b 0b f1 Timestamp: 926 ms, Packet: Security, Field: security.hash_boot_fit, Data: e5 de 58 7a 29 42 4b c4 4e d3 76 6f 1c c5 50 4c f9 48 8c 6e d8 9f 6e 27 2c 7b b6 c4 00 0f 3e d5 7f 28 e1 33 5b 8b 4f 00 ce 92 a2 83 8c e8 c4 11 Timestamp: 926 ms, Packet: Mem_utility, Field: utilization.memory, Data: 0
Some fields are invalid when the driver is not up (preboot mode). For more information, refer to the Gaudi 3 In-Band Telemetry for Hypervisor document located in the Intel Gaudi vault.