Qual Package Installation Validator
On this Page
Qual Package Installation Validator¶
The Qual package installation validator ensures that the Qual package is correctly installed and ready to run the Qual tests. Note that this script validates the the Qual package only.
Validation Areas¶
The Qual package depends on the installation and configuration of several components. The list below outlines the different parts validated by this script:
Installed Intel SW package - Verifies that the following components are installed and ensures they are all from the same version:
Firmware - Reads the device’s flashed firmware and compares it to the version installed on the host.
Firmware-ODM
Firmware tools
Linux Kernel Mode Driver
RDMA_CORE
HL-THUNK
Graph compiler
Qual-Workloads
Qual
External libraries: - MPIRUN - LS-SENSORS
Dynamically Linked Library Integrity - Verifies that all required shared object (SO) files for Qual can be loaded and ensures there are no missing files.
Environment Variables - Ensures all necessary environment variables are defined to support Qual’s operation.
Host Memory Configuration - Checks the amount of host memory, huge pages allocation, and shared memory status, confirming there are no residual artifacts from previous runs.
Host CPU Governor Status - Verifies the status of the host CPU governor.
Basic Device Health and Operational Status - Assesses the health and operational status of essential devices.
Switches and Usage¶
The qual_pckg_validator.py
script can be found under /opt/habanalabs/qual/diag_tool/automation
.
The following is a run command example:
python qual_pckg_validator.py --core <gaudi2 | gaudi3> --system_type <server | standalone> --num_of_devices <0..7> --output <path of the report>
options:
-h, --help show this help message and exit
--core_type CORE_TYPE
Core type (e.g., gaudi2, gaudi3)
--system_type SYSTEM_TYPE
System type (e.g., server, standalone)
--num_of_devices NUM_OF_DEVICES
Number of devices
-o OUTPUT, --output OUTPUT
Output file path for the environment report
Options:
Option |
Description |
---|---|
|
Core type can be gaudi2 or gaudi3. |
|
The system can be server or standalone. Server contains up to 8 cards with internal SERDES inetconnects, while the standalone usually is a collection of cards without any connectivity between them. |
|
Number of devices in the system. This value is needed to ensure that all devices are operational. |
|
Full path, including the file name. This will be used as the report name. |
After running the script, a textual report will appear in your terminal.
Example textual report:
---
'package-test-status': 'passed'
'lib-dependency-test-status': 'passed'
'env-vars-test-status': 'passed'
'python-libs-test-status': 'passed'
'host-mem-test-status': 'failed'
'shared-mem-test-status': 'passed'
'operational-test-status': 'Passed'
'cpu-governor-test-status': 'passed'
'reports':
- 'missing-packages-status': 'passed'
'version-test-status': 'passed'
'packages-versions':
- 'package_name': 'habanalabs-container-runtime'
'version': '1.20.1-69'
- 'package_name': 'habanalabs-dkms'
'version': '1.20.1-69'
- 'package_name': 'habanalabs-firmware-odm'
'version': '1.20.1-69'
- 'package_name': 'habanalabs-firmware-tools'
'version': '1.20.1-69'
- 'package_name': 'habanalabs-firmware'
'version': '1.20.1-69'
- 'package_name': 'habanalabs-graph'
'version': '1.20.1-69'
- 'package_name': 'habanalabs-qual-workloads'
'version': '1.20.1-69'
- 'package_name': 'habanalabs-qual'
'version': '1.20.1-69'
- 'package_name': 'habanalabs-rdma-core'
'version': '1.20.1-69'
- 'package_name': 'habanalabs-tests'
'version': '1.20.1-69'
- 'package_name': 'habanalabs-thunk'
'version': '1.20.1-69'
- 'package_name': 'habanatools'
'version': '1.20.1-69'
- 'lib-dependency-test': 'passed'
'bin-path': '/opt/habanalabs/qual/gaudi3/bin'
'tested-files':
- 'file-name': '/opt/habanalabs/qual/gaudi3/bin/libconcurrency_edp.so'
- 'file-name': '/opt/habanalabs/qual/gaudi3/bin/hbm_inject_error'
- 'file-name': '/opt/habanalabs/qual/gaudi3/bin/backtrace_debug'
- 'file-name': '/opt/habanalabs/qual/gaudi3/bin/libhbm_plugin_gaudi2.so'
- 'file-name': '/opt/habanalabs/qual/gaudi3/bin/libconcurrency_e2e.so'
- 'file-name': '/opt/habanalabs/qual/gaudi3/bin/libpci_bw_plugin.so'
- 'file-name': '/opt/habanalabs/qual/gaudi3/bin/read_nics_status'
- 'file-name': '/opt/habanalabs/qual/gaudi3/bin/libconcurrency_powertest.so'
- 'file-name': '/opt/habanalabs/qual/gaudi3/bin/runner'
- 'file-name': '/opt/habanalabs/qual/gaudi3/bin/libfunctional_test_plugin.so'
- 'file-name': '/opt/habanalabs/qual/gaudi3/bin/libNIC_basetest_plugin.so'
- 'file-name': '/opt/habanalabs/qual/gaudi3/bin/libmemory_bw_plugin.so'
- 'file-name': '/opt/habanalabs/qual/gaudi3/bin/libtraining_plugin.so'
- 'file-name': '/opt/habanalabs/qual/gaudi3/bin/iocache_loader'
- 'file-name': '/opt/habanalabs/qual/gaudi3/bin/extractApp'
- 'file-name': '/opt/habanalabs/qual/gaudi3/bin/hl_qual'
- 'file-name': '/opt/habanalabs/qual/gaudi3/bin/hbm_interrupts'
- 'file-name': '/opt/habanalabs/qual/gaudi3/bin/pcie_aer_detector'
- 'file-name': '/opt/habanalabs/qual/gaudi3/bin/libser_plugin.so'
- 'file-name': '/opt/habanalabs/qual/gaudi3/lib/libcoral_core_gaudi3.so'
- 'file-name': '/opt/habanalabs/qual/gaudi3/lib/libhost2_bmon_parser_lib.so'
- 'file-name': '/opt/habanalabs/qual/gaudi3/lib/libarc_core_g3.so'
- 'file-name': '/opt/habanalabs/qual/gaudi3/lib/libhost2_host_pcie_driver.so'
- 'file-name': '/opt/habanalabs/qual/gaudi3/lib/libcoral_infra.so'
- 'file-name': '/opt/habanalabs/qual/gaudi3/lib/libhost2_DMATests.so'
- 'file-name': '/opt/habanalabs/qual/gaudi3/lib/libhost2_pcie_driver.so'
- 'file-name': '/opt/habanalabs/qual/gaudi3/lib/libhost2_SivalTpcElfReader.so'
- 'file-name': '/opt/habanalabs/qual/gaudi3/lib/libhost2_test_core_for_device_runtime.so'
- 'file-name': '/opt/habanalabs/qual/gaudi3/lib/libcoral_user_gaudi3.so'
- 'file-name': '/opt/habanalabs/qual/gaudi3/lib/libhost2_sival_tpc_kernels.so'
- 'file-name': '/opt/habanalabs/qual/gaudi3/lib/libSynapseMmeReference.so'
- 'file-name': '/opt/habanalabs/qual/gaudi3/lib/libhost2_NICTests.so'
- 'file-name': '/opt/habanalabs/qual/gaudi3/lib/libarcbp.so'
- 'file-name': '/opt/habanalabs/qual/gaudi3/lib/libhost2_sival_gaudi3_mme_lib.so'
- 'file-name': '/opt/habanalabs/qual/gaudi3/lib/libpsoc_g3.so'
- 'file-name': '/opt/habanalabs/qual/gaudi3/lib/libhost2_sival_concurrency_lib.so_BCK'
- 'file-name': '/opt/habanalabs/qual/gaudi3/lib/libhost2_rottweiler_testlib.so'
- 'file-name': '/opt/habanalabs/qual/gaudi3/lib/libmme_test_gaudi3_lib.so'
- 'file-name': '/opt/habanalabs/qual/gaudi3/lib/libhost2_logger.so'
- 'file-name': '/opt/habanalabs/qual/gaudi3/lib/libhost2_device_runtime.so'
- 'file-name': '/opt/habanalabs/qual/gaudi3/lib/libhost2_sival_concurrency_lib.so'
- 'file-name': '/opt/habanalabs/qual/gaudi3/lib/libhost2_sival_tpc_tests_core.so'
- 'file-name': '/opt/habanalabs/qual/gaudi3/lib/libhost2_sival_tpc_tests_core_ext.so'
- 'file-name': '/opt/habanalabs/qual/lib/libdevice_runtime.so'
- 'file-name': '/opt/habanalabs/qual/lib/libsival_tpc_kernels.so'
- 'file-name': '/opt/habanalabs/qual/lib/libTpcElfReader.so'
- 'file-name': '/opt/habanalabs/qual/lib/liblogger.so'
- 'file-name': '/opt/habanalabs/qual/lib/libbmon_parser_lib.so'
- 'file-name': '/opt/habanalabs/qual/lib/librottweiler_testlib.so'
- 'file-name': '/opt/habanalabs/qual/lib/libsival_tpc_tests_core_ext.so'
- 'file-name': '/opt/habanalabs/qual/lib/libtpc_tests_core_ext.so'
- 'file-name': '/opt/habanalabs/qual/lib/libDMATests.so'
- 'file-name': '/opt/habanalabs/qual/lib/libtpc_numerics.so'
- 'file-name': '/opt/habanalabs/qual/lib/libtest_core_for_device_runtime.so'
- 'file-name': '/opt/habanalabs/qual/lib/libsival_concurrency_lib.so'
- 'file-name': '/opt/habanalabs/qual/lib/libtpcsim_shared.so'
- 'file-name': '/opt/habanalabs/qual/lib/libtpc_tests_lib.so'
- 'file-name': '/opt/habanalabs/qual/lib/libSivalTpcElfReader.so'
- 'file-name': '/opt/habanalabs/qual/lib/libpcie_driver.so'
- 'file-name': '/opt/habanalabs/qual/lib/libNICTests.so'
- 'file-name': '/opt/habanalabs/qual/lib/libsival_tpc_tests_core.so'
- 'file-name': '/opt/habanalabs/qual/lib/libhost_pcie_driver.so'
- 'env-test': 'passed'
'test-vars':
- 'name': 'HABANALABS_HLTHUNK_TESTS_BIN_PATH'
'status': 'passed'
'reason': 'points to: /opt/habanalabs/src/hl-thunk/tests'
- 'name': 'HABANA_LOGS'
'status': 'passed'
'reason': 'fully writable by user space'
- 'name': 'RDMA_CORE_LIB'
'status': 'passed'
'reason': 'points to /opt/habanalabs/rdma-core/src/build/lib'
- 'name': 'GC_KERNEL_PATH'
'status': 'passed'
'reason': 'points to /usr/lib/habanalabs/libtpc_kernels.so'
- 'name': 'HABANA_SCAL_BIN_PATH'
'status': 'passed'
'reason': 'points to /opt/habanalabs/engines_fw'
- 'python-libs-test': 'passed'
- &id001
'host-mem-status': 'passed'
'host-hugepages-status': 'failed'
'host-mem-size': '2113232068'
'host-hugepages-num': '24641'
- &id002
'shared-mem-status': 'passed'
- &id003
'device-identification-test':
'status': 'Passed'
'operational-devices':
- '0000:19:00.0': 'Operational'
- '0000:9b:00.0': 'Operational'
- '0000:bb:00.0': 'Operational'
- '0000:3b:00.0': 'Operational'
- '0000:cb:00.0': 'Operational'
- '0000:4c:00.0': 'Operational'
- '0000:db:00.0': 'Operational'
- '0000:5d:00.0': 'Operational'
'device-serial-report':
'status': 'Passed'
'device-mem-report':
'status': 'Passed'
'device-power-report':
'status': 'Passed'
'device-clock-report':
'status': 'Passed'
'device-fw-report':
'status': 'Passed'
- 'cpu-governor-test': 'passed'
...