hl_smi_async Tool

hl_smi_async is a utility tool for monitoring and managing the Gaudi devices asynchronously. Its basic functionality is a telemetry data collection and logging the collected data. hl_smi_async allows reading the different telemetry packets and handling all the related configurations and handshakes. For more information about the telemetry packets, see the Gaudi 3 In-Band Telemetry for Hypervisor document located in the Intel Gaudi vault. The hl_smi_async application is located in the hl-smi repository: hl-smi/app/hl-smi-async.

Note

  • The tool can run with or without driver loaded.

  • It is only possible to run the tool on a bare-metal machine or a Hypervisor. Running the tool on a guest VM may cause undefined behavior.

Options and Usage

The following table lists the available hl_smi_async options and their usage to help you effectively configure the tool for your specific needs.

Example:

sudo /usr/sbin/hl-smi-async -D b1:00.0 -O console -L info

Option

Description

-h, --help

Outputs the help message and exits.

-V, --version

Outputs version information and exits.

-D, --device <device>

Specifies the PCIe address of the device (e.g. b1:00.0).

-O, --output [output]

Specifies the telemetry output type. Valid values:

  • console (default) - Prints all collected telemetry to the console.

  • logfile - Prints all collected telemetry to a logfile: telem_log.txt

-L, --loglevel [level]

Specifies the log level. Valid values:

  • info (default) - Log informative messages.

  • debug - Log debug info.

  • error - Log all errors.

Output example:

  • Driver is not up (preboot mode):

    Reading all synced telemetry packets
    Timestamp: 10943 ms, Packet: Temperature, Field: temperature.aip, Data: 38
    Timestamp: 10861 ms, Packet: Power, Field: power.draw.54v, Data: 157
    Timestamp: 10979 ms, Packet: Health, Field: health, Data: 1
    Timestamp: 10861 ms, Packet: Perf, Field: uptime, Data: 10
    Timestamp: 3858 ms, Packet: sys_stat, Field: ib_fw_update.stat, Data: 2
    Timestamp: 3859 ms, Packet: Security, Field: security.hash_spi_code, Data:
    4a ac 10 c9 8c 97 76 1f b0 b6 7d 04 7f 67 f9 e0 42 8c cc f3 b4 7a 22 05 74 07 23 f1 ee 14 31 43 cd 99 55 e3 05 9a 3e 3e 69 8c 0d 4d 1f 1b 0b f1
    Timestamp: 3858 ms, Packet: Mem_utility, Field: utilization.memory, Data: 0
    
  • Driver is up:

    Reading all synced telemetry packets
    Timestamp: 80262313 ms, Packet: Temperature, Field: temperature.aip, Data: 37
    Timestamp: 80262754 ms, Packet: Power, Field: power.draw.54v, Data: 179
    Timestamp: 80262754 ms, Packet: Power, Field: power.draw.12v, Data: 14
    Timestamp: 80262561 ms, Packet: Health, Field: health, Data: 2
    Timestamp: 80262754 ms, Packet: Perf, Field: uptime, Data: 80262
    Timestamp: 926 ms, Packet: sys_stat, Field: ib_fw_update.stat, Data: 2
    Timestamp: 926 ms, Packet: Security, Field: security.hash_spi_code, Data:
    4a ac 10 c9 8c 97 76 1f b0 b6 7d 04 7f 67 f9 e0 42 8c cc f3 b4 7a 22 05 74 07 23 f1 ee 14 31 43 cd 99 55 e3 05 9a 3e 3e 69 8c 0d 4d 1f 1b 0b f1
    Timestamp: 926 ms, Packet: Security, Field: security.hash_boot_fit, Data:
    e5 de 58 7a 29 42 4b c4 4e d3 76 6f 1c c5 50 4c f9 48 8c 6e d8 9f 6e 27 2c 7b b6 c4 00 0f 3e d5 7f 28 e1 33 5b 8b 4f 00 ce 92 a2 83 8c e8 c4 11
    Timestamp: 926 ms, Packet: Mem_utility, Field: utilization.memory, Data: 0
    

Some fields are invalid when the driver is not up (preboot mode). For more information, refer to the Gaudi 3 In-Band Telemetry for Hypervisor document located in the Intel Gaudi vault.