hl_smi_async Tool
On this Page
hl_smi_async Tool¶
hl_smi_async is a utility tool for monitoring and managing the Gaudi devices asynchronously.
Its basic functionality is a telemetry data collection and logging the collected data. hl_smi_async allows reading
the different telemetry packets and handling all the related configurations and handshakes. For more information
about the telemetry packets, see the Gaudi 3 In-Band Telemetry for Hypervisor document located in the Intel Gaudi vault and Intel RDC.
The hl_smi_async application is located in the hl-smi repository: hl-smi/app/hl-smi-async
.
Note
The tool can run with or without driver loaded.
It is only possible to run the tool on a bare-metal machine or a Hypervisor. Running the tool on a guest VM may cause undefined behavior.
Options and Usage¶
The following table lists the available hl_smi_async options and their usage to help you effectively configure the tool for your specific needs.
Example:
sudo /usr/sbin/hl-smi-async -D b1:00.0 -O console -L info -I 5
Option |
Description |
---|---|
|
Outputs the help message and exits. |
|
Outputs version information and exits. |
|
Specifies the PCIe address of the device (e.g. b1:00.0). |
|
Specifies the telemetry output type. Valid values:
|
|
Specifies the log level. Valid values:
|
|
Specifies the number of iterations. If not set (default), the tool runs in an endless loop. |
Output example:
Driver is not up (preboot mode):
Reading all synced telemetry packets Timestamp: 10943 ms, Packet: Temperature, Field: temperature.aip, Data: 38 Timestamp: 10861 ms, Packet: Power, Field: power.draw.54v, Data: 157 Timestamp: 10979 ms, Packet: Health, Field: health, Data: 1 Timestamp: 10861 ms, Packet: Perf, Field: uptime, Data: 10 Timestamp: 3858 ms, Packet: sys_stat, Field: ib_fw_update.stat, Data: 2 Timestamp: 3859 ms, Packet: Security, Field: security.hash_spi_code, Data: 4a ac 10 c9 8c 97 76 1f b0 b6 7d 04 7f 67 f9 e0 42 8c cc f3 b4 7a 22 05 74 07 23 f1 ee 14 31 43 cd 99 55 e3 05 9a 3e 3e 69 8c 0d 4d 1f 1b 0b f1 Timestamp: 3858 ms, Packet: Mem_utility, Field: utilization.memory, Data: 0
Driver is up:
Starting telemetry data collection Reading all synced telemetry packets Timestamp: 42164 ms, Packet: Temperature, Field: temperature.aip, Data: 44 Timestamp: 42420 ms, Packet: Power, Field: power.draw.54v, Data: 157 Timestamp: 42420 ms, Packet: Power, Field: power.draw.12v, Data: 14 Timestamp: 42026 ms, Packet: Health, Field: health, Data: 3 Timestamp: 42419 ms, Packet: Perf, Field: uptime, Data: 42 Timestamp: 919 ms, Packet: System Status, Field: ib_fw_update.stat, Data: 2 Timestamp: 919 ms, Packet: System Status, Field: ethernet_ports.state, Data: 16777215 Timestamp: 919 ms, Packet: Security, Field: security.hash_spi_code, Data: 5c 49 ed 19 05 1e d6 a6 03 81 13 d0 74 87 dc e4 90 6b bb 74 ee 06 35 69 f6 7a 69 06 eb 8a c8 a9 ab bd 0c 32 0e 59 1e 55 48 fe aa 8d 87 32 a6 e1 Timestamp: 919 ms, Packet: Security, Field: security.hash_boot_fit, Data: 0d 8d d3 02 3a ea d1 d1 88 21 03 0f 40 4d bf 98 a0 5c 52 b1 3e 6e 16 c4 16 78 0c c7 b1 95 8c 42 66 75 e0 39 9a df 4d fb 40 d8 f6 22 3b 11 d4 6c Timestamp: 5061 ms, Packet: Mem_utility, Field: utilization.memory, Data: 0 Reading all synced telemetry packets
Some fields are invalid when the driver is not up (preboot mode). For more information, refer to the Gaudi 3 In-Band Telemetry for Hypervisor document located in the Intel Gaudi vault and Intel RDC.