hl_qual Report Structure
On this Page
hl_qual Report Structure¶
hl_qual generates a test report composed of sub-reports. The name of the report file, for example k501-u18-001-dev_hl_qual_report_Sat_Dec_4_09-15-16_2021.log
, includes
the tested server name, the string hl_qual_report
and a timestamp with the date and time.
The hl_qual reports and log files are printed to a directory that is determined by
the $HABANA_LOGS
environment variable, using the $HABANA_LOGS/qual
path.
If HABANA_LOGS
is not defined, hl_qual will set the path locally
to /var/log/habana_logs
and redirect the file printout to /var/log/habana_logs/qual
.
Device Identification Report¶
This report contains the PCI bus ID of all identified devices according to the device identification switch entered, (for example, -gaudi
).
It contains device status reports that verify if the device is in operational state. If hl_qual finds that a certain device
is not in operation state, the test will not be executed.
hl-smi
Short Report¶
The hl-smi
report provides an identification card for all available devices including their bus_id, serial number, device index, module ID and device type.
Operational Status Report¶
The operational status report contains the results of the operational test conducted on all detected Gaudi devices within the system. A device will fail the test if it does not meet the following criteria:
Memory usage exceeds the idle time memory usage threshold.
The operational indication, as set by the Intel Gaudi Linux kernel driver, is either unavailable or indicates that the device is not operational.
NUMA Node Report¶
The report contains the identified NUMA nodes, CPU sets and allocation of Gaudi devices per NUMA node. If the tested server contains a single NUMA node, the NUMA node allocation considerations in CPU to device allocation will not exist.
Note
When running on a virtual machine, the NUMA node data is usually not reflected correctly between the bare metal machine and the VM.
hl_qual Version and Command Line Report¶
Reports the hl_qual package version and specifies the command line used.
Tested Device Report¶
The report contains the following information:
The specific data of the device: serial number, PCB assembly version, device name.
The time the test starts and stops.
Internal test plugin data accumulated during the test run, such as pass/fail data, general test stages.
Closing Report¶
The report contains the following items:
General statistics and metrics report gathered during the duration of the test, such as power usage, clocks and temperature.
Pass/fail report per tested device.
General pass/fail report. As a result, all tests should pass on all devices.