Test Plan Automation
On this Page
Test Plan Automation¶
The Test Plan Automation functionality is designed to streamline the execution of hl_qual tests by organizing them into structured workflows and allowing them to be configured via YAML files.
The following terms are defined and used throughout this section:
Test plan - A set of hl_qual tests to be run in a single execution flow. A test plan is executed by using configuration YAML files.
Default test plan - A set of hl_qual tests configuration YAML files provided in their original form, without any customization.
Default test - Test-specific configuration YAML file provided in their original form, without any customization. The default test includes the test name defined by hl_qual, all applicable environment variables initially marked as
not_used
, and the test switches categorized asmandatory
,used
, ornot_used
.Virtual UART - FW log stream that is generated for the host via hl-smi daemon.
The Test Plan Automation functionality contains the following components:
Component |
Description |
---|---|
|
The main Python script which runs the Test Plan Automation. See Switches and Usage. |
|
The script that analyzes log files and creates reports. See Log Analysis. |
|
Bash scripts designed for specific tasks. see Bash Scripts. |
|
Suggested test plans for hl_qual tests, featuring around 25 different test variants. These test plans serve as a starting point for creating customized test plans. See Configuration YAML Files. |
Switches and Usage¶
The diag_tool_automation.py
script executes all the features included in the Test Plan Automation.
The table below lists the switches and options available for the script.
Example:
diag_tool_automation.py [-h] --core {gaudi3,gaudi2} --exec {tests_name_list,gen_plan_cfg,gen_test_cfg,validate_yaml,validate_test_plan,run_test_plan}
[--input_path INPUT_PATH] [--output_path OUTPUT_PATH] [--plan_name PLAN_NAME] [--test_name TEST_NAME]
[--postfix POSTFIX] [--disable_prints DISABLE_PRINTS]
Switch Name |
Description |
---|---|
|
Defines the Gaudi core type. Supported options:
|
|
Executes the Test Plan Automation features. Supported options:
|
|
Sets the path to the YAML configuration file depending on the
selected |
|
Sets the path to save the run artifacts depending on the
selected |
|
Sets the test plan name. |
|
Sets the test name during the test-specific YAML file generation. |
|
Adjusts the test plan name or the test name. If not set, the default name is used:
|
|
Disables screen printout during the test plan run. |
Note
For further help, run python diag_tool_automation.py -h
.
Validation Reports¶
The following are examples of the validation reports printed when using the --exec validate_yaml
and --exec validate_test_plan
switches.
Test-specific YAML file validation report (success):
Test-specific YAML file validation report (failure):
Test plan YAML validation report:
Configuration YAML Files¶
YAML files are used to configure the Test Plan Automation. The YAML files types and their customization options are described below.
The test plan YAML file defines the general setup of the test plan and provides paths to all test-specific configuration YAML files included in the test plan.
The following is the structure of the test plan YAML file:
General configuration:
Test plan folder path
Test plan name
hl_qual bin folder path
Enable/disable flag for the virtual UART collection process
Test-specific configuration:
Test name - The default YAML file assigns a name based on hl_qual, but it can be modified.
Test YAML file name - When combined with the test plan folder path, it provides access to all test configuration YAML files (e.g., E2E.yaml).
Number of repetitions for the test run.
Pre-test and post-test hooks. These can be any of the following:
Linux script
Linux command
Python script execution
Note
The default test plan can be run but highly limited, as many switches in the test-specific configuration YAML file are marked as
not-used
. Therefore, it is recommended to use the test plan customization.Test plan YAML example:
test_plan_yaml_dir: /home/lab/trees/npu-stack/qual/diag_tool/automation/test_plans/qual_internal_test_plan/ test-plan-name: qual_internal_test_plan bin-Folder: /home/lab/builds/qual_release_build/gaudi3/bin enable-vuart: true tests: - test-name: E2E test-yaml-path: E2E.yaml test-repeat-no: 3 pre-run: 'sudo dmesg -C' post-run: N/A - test-name: FunctionalTest_extreme test-yaml-path: FunctionalTest_extreme.yaml test-repeat-no: 6 pre-run: 'driver_load_unload.sh' post-run: N/A
Customization
The following changes are applicable to the test plan YAML file:
Test plan:
test_plan_yaml_dir
- Modify the path to a new location. This is useful when copying the test plan. Make sure that the folder path exists.
test-plan-name
- Can be changed. Used only in logs and report printouts.
bin-Folder
- Must point to/opt/habanalabs/qual/gaudi3/bin
.
enable-vuart
- Can be set to either True or False to enable or disable virtual UART collection.Tests entry:
test-name
- Any string. Used as a logical label in logs and reports.
test-yaml-path
- Any name, as long as a file with this name exists intest_plan_yaml_dir
.
test-repeat-no
- Accepts any integer between 1 and 10,000.
pre-run
orpost-run
- Any Linux script command can be placed here.Test entry commenting and deletion - Each test entry can be commented out or deleted using #. For example:
# test-name: FunctionalTest_extreme # test-yaml-path: FunctionalTest_extreme.yaml # test-repeat-no: 6 # pre-run: 'driver_load_unload.sh' # post-run: N/A
The test-specific YAML file includes environment variables and all relevant switches for the test.
The following is the structure of the test-specific YAML file:
Environment variables - Lists all the available environment variables for the specific test. The entries in this section can be added, removed, or commented out. However, if an entry is unrecognized by hl_qual or any other component called by hl_qual, it may have no effect.
Example:
env-var: - var-name: ENABLE_CONSOLE value: 'true' is-used: false - var-name: LOG_LEVEL_QUAL value: '0' is-used: false - var-name: LOG_LEVEL_ALL value: 0 is-used: falseSwitches - This section is validated by hl_qual. The validation process consists of two stages: first, the automation validation process converts the YAML file into JSON, and then the JSON file is fed to hl_qual for verification.
Example:
switches: - switch: -gaudi3 usage-state: mandatory description: core type selection switch - switch: -c usage-state: mandatory value: all description: 'PCI bus ID, with the applicable range: [all,0000:08:00.0,0000:09:00.0,quad_0]' - switch: -dis_mon usage-state: used description: Disables the monitor display (monitoring is still executed) - switch: -dmesg usage-state: not_used description: 'Adds running dmesg to the qual report (Note: the dmesg will be cleaned)' - switch: -enable_serr usage-state: used description: Enable SERR for ECC error - switch: -h usage-state: not_used description: print help guide - switch: -mon_cfg usage-state: not_used description: ScreenDrawer INI configuration path - switch: -rmod usage-state: mandatory value: parallel description: 'Running mode, with the applicable range: [parallel,serial]' - switch: -f2 usage-state: mandatory description: Functional test selector - switch: -d usage-state: not_used description: Download input tensors only once, the same tensors will be used throughout all the iterations of the test - switch: -enable_ports_check usage-state: used value: int description: 'Enable verifying NIC internal/external ports are up, with the applicable range: [int,all]' - switch: -l usage-state: mandatory value: extreme description: 'Power level [extreme,high], with the applicable range: [extreme,high]' - switch: -sensors usage-state: used value: 15 description: 'Enable sensors collection, -sensors <sample time in seconds> , with the applicable range: [1 - 3600]' - switch: -serdes usage-state: used value: int description: 'Enable serdes test [int / ext]. using -serdes ext requires loopback dongles on external ports, with the applicable range: [int,ext]' - switch: -t usage-state: mandatory value: 600 description: 'Execution time in seconds, with the applicable range: [240 - 259200]' - switch: -toggle usage-state: used description: check toggling
Customization
The following changes are applicable to the test-specific YAML file:
test-name
- Cannot be changed.
env-var
orswitches
- Cannot be deleted, commented out, or modified.Switch entries:
If the
usage-state
ismandatory
, the entry cannot be deleted or commented out.If the
usage-state
isused
ornot_used
, the entry can be deleted or commented out.Entries marked as
used
can be changed tonot_used
, and vice versa.The range of values that are switched with values must adhere to a predefined range. Refer to the
description
field for range details.The description can be edited as this filed is ignored by the Test Plan Automation functionality and serves as a help guideline.
Bash Scripts¶
The Test Plan Automation functionality provides access to pre-defined bash scripts that perform the following operations:
The hard_reset.sh
script performs operations by writing to the appropriate sysFS location.
Therefore, the driver must be loaded before using this script:
hard_reset.sh -sleep <int value> -n <int value>
Options:
Option |
Description |
---|---|
|
Specifies the number of test trials to verify if the device is operational. The script pauses for 10 seconds between each trial. |
|
Specifies the number of seconds to wait after all devices are fully operational. |
The driver_reload.sh
script attempts to load the driver by first unloading it and then reloading it.
driver_reload.sh -sleep <int value> -n <int value> -timeout_locked <int value> -d
Options:
Option |
Description |
---|---|
|
Specifies the number of test trials to verify if the device is operational. The script pauses for 10 seconds between each trial. |
|
Specifies the number of seconds to wait after all devices are fully operational. |
|
Required only for Gaudi 2. The expected value is 0. |
|
Enables debug mode for the driver. |
Note
Before enabling virtual UART option, the driver must be loaded only once. Make sure to set enable-vuart: false
when using the driver_reload.sh
script in the pre-run stage.
Bash Scripts Usage in a Test Plan¶
The following is an example of a test plan YAML file that includes the bash scripts. In this example, the driver is loaded during the pre-test stage, which occurs before any other test stage, including enabling UART and capturing dmesg logs. During execution, the NIC_base allreduce test is run five times, with a hard reset performed before each test execution.
test_plan_yaml_dir: /home/lbf/test_plans/qual_test_plan
test-plan-name: qual_test_plan
bin-Folder: /opt/habanalabs/qual/gaudi3/bin
enable-vuart: true
pre-test-plan: 'driver_reload.sh -sleep 60 -n 15 -d'
tests:
- test-name: NIC_BASE_COLLECTIVE_allreduce
test-yaml-path: NIC_BASE_COLLECTIVE_allreduce.yaml
test-repeat-no: 5
pre-run: 'hard_reset.sh -sleep 60 -n 15'
post-run: N/A