hl_qual Report Structure

Overview

The hl_qual generates a test report composed of sub reports. The name of the report file includes:

  • Tested server name

  • The string: hl_qual_report

  • Time stamp including date and time

For example, k501-u18-001-dev_hl_qual_report_Sat_Dec_4_09-15-16_2021.log.

hl-qual reports and log files are printed to a directory that is determined by the $HABANA_LOGS environment variable, using $HABANA_LOGS/qual path. If HABANA_LOGS is not defined, hl-qual will set it locally to /var/log/habana_logs and redirect the file printout to /var/log/habana_logs/qual.

Device Identification Report

This report contains the PCI bus ID of all identified devices according to the devices switch entered, (for example, -gaudi). It contains device status reports that verify if the device is in operational state. If hl_qual finds that a certain device is not in operation state, the test will not be executed.

../../_images/device_indentification_report.PNG

Figure 8 Device Identification Report

hl-smi Short Report

The hl-smi report provides an identification card for all available devices including their bus_id, serial number, device index, module ID and device type.

../../_images/hl_smi_short_report.JPG

Figure 9 HL-SMI short report

Operational Status Report

The operational status report contains the results of the operational test conducted on all detected Gaudi devices within the system. A device will fail the test if it does not meet the following criteria:

  • Memory usage exceeds the idle time memory usage threshold.

  • The operational indication, as set by the Intel Gaudi Linux kernel driver, is either unavailable or indicates that the device is not operational.

../../_images/operatinal_status_report.JPG

Figure 10 HL-SMI Short Report

NUMA Node Report

The report contains the identified NUMA nodes, CPU sets and allocation of Gaudi devices per NUMA node. If the tested server contains a single NUMA node, the NUMA node allocation considerations in CPU to device allocation will not exist.

Note

When running on a virtual machine, the NUMA node data is usually not reflected correctly between the bare-metal machine and the VM.

../../_images/numa_node_device_allocation.PNG

Figure 11 NUMA Node Report

Hl-qual Version and Command Line Report

Reports the hl_qual package version and specifies the command line used:

../../_images/command_line_report.JPG

Figure 12 Command Line Report

Tested Device Report

The report contains the following information:

  • The specific data of the device: serial number, PCB assembly version, device name.

  • The time the test starts and stops.

  • Internal test plugin data accumulated during the test run, such as pass/fail data, general test stages.

../../_images/device_test_data.JPG

Figure 13 Device Test Report

Closing Report

The report contains the following items:

  • General statistics and metrics report gathered during the duration of the test, such as power usage, clocks and temperature.

  • Pass/fail report per tested device.

  • General pass/fail report. As a result, all tests should pass on all devices.

../../_images/closing_report.JPG

Figure 14 Closing Report