hl_qual Monitor Textual UI

The monitor is a textual UI that enables the monitoring of Habana devices run parameters such as temperature, power usage, clock, ECC errors and more. The monitor also shows the test progress via a progress bar as well as the expected test time completion.

../../_images/monitor+progressBar.PNG

Figure 38 Monitor Textual UI Interface

You can disable the monitor screen printout by using -dis_mon switch. This option is important when you run the hl_qual in a scripting environment.

Note

Disabling the monitor will not stop parameter collection as these are needed for the hl_qual’s final test report. You may configure which parameters should be collected by configuring a monitor INI configuration file. For more information about monitor configuration file, refer to Monitor ini Configuration File.

Monitor ini Configuration File

The following sections are fixed in ini configuration file:

  • [TEMP_MON] - Temperature monitoring parameter section.

  • [POWER_MON] - Power usage monitoring parameter section.

  • [CLOCK_MON] - Clock monitoring parameter section.

  • [MEM_MON] - Memory usage monitoring parameter section.

  • [SRAM_SERR_MON] - Single error on SRAM memory monitoring parameter section.

  • [SRAM_DERR_MON] - Double error on SRAM memory monitoring parameter section.

  • [DRAM_SERR_MON] - Single error on DRAM (HBM) memory monitoring parameter section.

  • [DRAM_SERR_MON] - Double error on DRAM (HBM) memory monitoring parameter section.

The following ini snippets show the applicable control fields:

[TEMP_MON]
enable=true
LOW=15
HIGH=75
[POWER_MON]
enable=true
LOW=45
HIGH=340
[CLOCK_MON]
enable=true
LOW=1850
HIGH=1950
[MEM_MON]
enable=false
HIGH=30720
[SRAM_SERR_MON]
enable=true
[SRAM_DERR_MON]
enable=true
[DRAM_SERR_MON]
enable=true
[DRAM_DERR_MON]
enable=true
  • enable - Enables or disables monitoring a specific value. Applicable values: true/false.

  • LOW - States the specific low value for the monitored parameter. If the measured value is below that threshold, the monitor marks it in red on the monitoring UI.

  • HIGH - States the specific high value for the monitored parameter. If the measured value is above that threshold, the monitor marks it in red on the monitoring UI.

Note

Disabling the monitoring on specific values will make the sampling process work faster and improve the monitor UI refresh rate, especially when the system contains multiple devices.

The monitor is also supplied as a standalone application which can be used to monitor other applications running on Habana devices:

./monitor 100 10 10 -gaudi