hl_qual Common Plugin Switches and Parameters

hl_qual is a command line based tool where each test variant is run from a command line terminal. Passing parameters to the hl_qual and the other test plugins is done through command line switches and parameters.

The hl_qual and test plugin switches and parameters are partitioned into two groups:

  • Common switches and parameters - These parameters are identical for all test plugins.

  • Test plugins specific switches and parameters - These are unique parameters for each test plugin.

Note

For the full Gaudi installation procedure, refer to the Installation Guide. After installing hl_qual, set export  __python_cmd=python3 environment variable.

The following table lists the common switches and parameters that are applicable to all test plugins. All the applicable switches are shown in the example below; optional switches or parameters are placed within square brackets:

./hl_qual -gaudi | -gaudi2 | -gaudi3 -c <all or PCI bus id list> -rmod <serial | parallel>  [-dis_mon] [-mon_cfg <monitor INI path>]
      [<plugin INI config path>] [-enable_serr] [-dmesg] [-disable_pipe_red] [-skip_aer_detection]

Switch Type

Switch name

Description

Device Identification Switch

  • -gaudi

  • -gaudi2

  • -gaudi3

Indicates that a first-gen Gaudi, Gaudi 2 or Gaudi 3 device should be detected and used for testing:

./hl_qual -gaudi|-gaudi2|-gaudi3 -c all -rmod parallel -t 20 -f2
 -l extreme

Note

  • Choosing the Gaudi device is mandatory.

  • You can specify only one device switch in a single hl_qual command line.

  • If a test is not supported by a device, it is clearly indicated in both the command line description and the corresponding section title.

Disable Monitor Screen Printout Switch

-dis_mon

Stops the monitor printout to the screen. This option is useful when running hl_qual inside a script:

./hl_qual -gaudi -dis_mon -c all -rmod parallel -t 20 -f2
-l extreme

Note

Various data values, including power, clock,and temperature, are still performed but the results are not displayed on your screen.

Change Monitor Configuration INI File Switch

-mon_cfg <path to monitor config file>

Enables using a different monitor configuration file instead of the default monitor.ini:

./hl_qual -gaudi -c all -mon_cfg my_mon.ini -rmod serial
 -t 20 -p -b

Device PCI Bus ID Identification Switch

-c <pci bus id>

Allows specifying Gaudi devices under test bus ids. There are three applicable formats:

  • all - All operational devices for the tested server must be aligned with the device identification switch definition:

    ./hl_qual -gaudi -c all -rmod serial -t 20 -b -p
    
  • comma delimited bus ID list - All bus IDs in the Device Identification Switch list:

    ./hl_qual -gaudi -c 0000:07:00.0,0000:08:00.0 -rmod serial
     -t 20 -p -b
    
  • <0 | 1> - Applicable only when running tests on the HL-338 PCIe cards on servers with two quads. Must be used together with -rmod quad and -gaudi3:

    • To run tests on the first quad (devices with module ID 0-3):

      ./hl_qual -gaudi3 -c 0 -rmod quad -t 240 -f2 -l extreme
      
    • To run tests on the second quad (devices with module ID 4-7):

      ./hl_qual -gaudi3 -c 1 -rmod quad -t 240 -f2 -l extreme
      

Note

  • Avoid using spaces between the comma and the bus ID strings in the comma delimited bus ID list.

  • hl_qual tests run only in parallel mode when using HL-338 PCIe cards on servers with two quads.

Test Running Mode

-rmod <running mode>

Specifies the running mode on the available Gaudi devices. There are three applicable modes:

  • parallel- The plugin under test runs on all available devices at the same time:

    ./hl_qual -gaudi -c all -rmod parallel -t 20 -f2 -l extreme
    
  • serial- The plugin under test runs on one device at a time:

    ./hl_qual -gaudi -c all -rmod serial -t 20 -f2 -l extreme
    
  • quad- Applicable only when running tests on the HL-338 PCIe cards on servers with two quads. Must be used together with -c <0 | 1> and -gaudi3:

    ./hl_qual -gaudi3 -c 0 -rmod quad -t 240 -f2 -l extreme
    

Note

hl_qual tests run only in parallel mode when using HL-338 PCIe cards on servers with two quads.

Enable Memory Error Monitoring Switch

-enable_serr

Enables hl_qual SERR/DERR counter check which verifies that no single ECC error or double error occurs while running the plugins. hl_qual reads the memory error indication via HLML library:

./hl_qual -gaudi -dis_mon -c all -rmod parallel -t 20
-f2 -l extreme -enable_serr

Disable Standard Output Pipelining Switch

-disable_pipe_red

Disables the runner standard output redirection to the message pipe. This is significant when using logs directed to the screen or any other debug printouts. The standard output redirection is automatically disabled by default when console logs are enabled:

./hl_qual -gaudi -dis_mon -c all -rmod parallel -t 20
-f2 -l extreme -disable_pipe_red

For further information, refer to hl_qual Failure Debug.

Enable Concatenation of dmesg Log

-dmesg

Appends the dmesg report to hl_qual report. The dmesg report is accumulated for the duration of the test:

./hl_qual -gaudi -dis_mon -c all -rmod parallel -t 20 -f2
-l extreme -dmesg

Disable AER Test Runs

-skip_aer_detection

AER test runs an AER readout application that reads all error bits indication that occur during the last test:

./hl_qual -gaudi -dis_mon -c all -rmod parallel -t 20 -f2
-l extreme -skip_aer_detection

Getting Help

The following command prints out a usage help message on screen:

./hl_qual -h

The message includes specific hl_qual switches as well as the switches of all available and loaded plugins.

../../_images/help_message.jpg

Figure 8 hl_qual and Plugin Usage Printout