Power Stress and EDP Tests Plugins Design, Switches and Parameters
On this Page
Power Stress and EDP Tests Plugins Design, Switches and Parameters¶
This section describes plugin specific switches, however, it will not focus on the common switches although these switches will be mentioned here for the completeness of the command examples. To see the common plugin switches and parameters, refer to hl_qual Common Plugin Switches and Parameters.
Note
libpower_stress_plugin.so
is a dynamically linked library implementing both Power stress and EDP tests plugins.
Gaudi2 Power Stress Plugin Design Consideration and Responsibilities¶
The power stress plugin performs a maximum power stress on the device. The test plugin supports three power levels: extreme, mid and low.
Power Stress Test Plugin¶
Note
The power measurements below are the peak power levels as measured by the HLML API.
The power stress plugin running a power stress test puts the device in constant and equal level power load. The tests can run for long hours and test the following device functionalities:
Thermal stress test, cooling system functionality, temperature dissipation and thermal protection mechanisms can be checked while running power stress plugin in extreme load.
PID module responsible for the power management and keeping power usage below allowable MAX power of 550 [watts].
Note
When using Gaudi2 Power Stress, the driver must be loaded by executing the following command:
sudo modprobe habanalabs timeout_locked=0**
Pass/fail Criteria¶
Test must pass all test phases with success (init, test run).
Test must not trigger the builtin temperature protection.
MME engin calculation must be bit-exact.
First-gen Gaudi Power Stress Plugin Design Consideration and Responsibilities¶
The power stress plugin performs the following:
Conducts multi-level power stress test.
Conducts multi-level power EDP test.
The power level for both power stress and EDP tests are configurable via command line settings and aligned with the following levels:
Extreme - measured power level: 350 - 350[watt]
Mid – measured power level: 235 - 245[watt]
Note
The above numbers were achieved on an HL-205 device.
Power Stress Test Plugin¶
The power stress plugin running a power stress test puts the device in constant and equal level power load. The tests can run for long hours and test the following device functionalities:
Thermal stress test, cooling system functionality, temperature dissipation and thermal protection mechanisms can be checked while running power stress plugin in extreme load.
Power limiter and clock relaxation mechanisms – The power limiter is a mechanism that limits the power usage below 300 [watts]. When the power limit is met, the device clocks are lowered. To test the power limiter mechanism, the plugin must run at an extreme power level.
Long work periods in typical power workloads (extreme, low).
Power usage and reported temperature can change between different systems and is dependent on the ambient condition of the tested server (fan speed, ambient temperature and number of devices been tested).
Note
To ensure reaching maximum power, the test should be run beyond 30 seconds.
Pass/fail Criteria¶
The power stress tests must run until completion without overheating or power supply failures.
Power Stress Test Plugin Switches and Parameters¶
First-gen Gaudi and Gaudi2 test variants differ in the test capabilities as demonstrated in the command line below:
hl_qual -gaudi2 -c <pci bus id> [-t <time in seconds>] -rmod <serial | parallel> [-dis_mon] [-l <extreme | mid | low]
-s
Log File |
Description |
---|---|
|
Power stress test duration in seconds. If this switch is omitted, the default value is 60 seconds. |
|
Power stress test selector. |
|
Power level selector for 54V power supply:
|
./hl_qual -gaudi2 -c all -rmod parallel -s -t 120
hl_qual -gaudi -c <pci bus id> [-t <time in seconds>] -rmod <serial | parallel> [-dis_mon]
-s [-l <extreme | mid >]
Log File |
Description |
---|---|
|
Power stress test duration in seconds. If this switch is omitted, the default value is 40 seconds. |
|
Power stress test selector. |
|
Power level selector:
If the value is not specified, the default value is MID. |
./hl_qual -gaudi -c all -rmod parallel -s l extreme -t 120
EDP Test Plugin Design Consideration and Responsibilities¶
The EDP test verifies the functionality of the first-gen Gaudi power supply by generating a fast power usage transient from low power to high power and vice versa.
The power cycles repeat throughout the test’s execution time.
Figure 21 EDP Test Power Cycles¶
The test’s configurable parameters are the different power levels. The power cycle consists of high-power level usage and low-power level usage (idle state) as shown in Fig. 21. All configurable parameters are listed in EDP Stress Test Plugin Switches and Parameters.
Pass/fail Criteria¶
The EDP test must run until completion without overheating or power supply failures.
Gaudi2 EDP Stress Test Plugin Switches and Parameters¶
Note
You cannot configure the timing of power cycling for Gaudi2 EDP test variant, since, by the default, the high power period is set to 1 second while the IDLE period is set to 4 seconds. When using Gaudi2 EDP, the driver must be loaded by executing the following command:
sudo modprobe habanalabs timeout_locked=0
./hl_qual -gaudi2 -c <pci bus id> [-t <time in seconds>] -rmod <serial | parallel> [-dis_mon] [-mon_cfg <monitor INI path>]
-e [-sync] [-inc_power] [-enable_ports_check <all | int>] [-Tw <time in seconds>][-Ts <time in seconds>]
Log File |
Description |
---|---|
|
Power stress test duration in seconds. If this switch is omitted, the default value is 40 seconds. In extreme power mode where clock throttling is expected, the test duration could exceed the time you have set. |
|
EDP test selector. |
|
Enables device sync. When using this switch, the raising edge of the power cycle will be synchronized between all devices. |
|
Enables NIC execution which ensures high power usage. |
|
When enabled, it checks First-gen gaudi and Gaudi2 ports. The test will generate a report on the status of all test ports, indicating whether they are UP or DOWN. The port check test provides useful information only, and will not result in a test failure if any ports are identified as being in a DOWN state.
|
|
Sets the duration of the high power time for the EDP power cycle which is measured in seconds. |
|
Sets the duration of the ideal time for the EDP power cycle which is measured in seconds. |
Note
If the ports check test detects any ports that are not in an UP state while using the -enable_ports_check
switch, the test will be considered as failed.
./hl_qual -gaudi -c all -rmod parallel -t 40 -e -gaudi2 -Tw 3 -Ts 1
The above command line executes the EDP test for 40 seconds and runs 10 power cycles. Each power cycle runs 3 seconds of high power usage and 1 second of low power usage.
First-gen Gaudi EDP Stress Test Plugin Switches and Parameters¶
hl_qual -gaudi -c <pci bus id> [-t <time in seconds>] -rmod <serial | parallel> [-dis_mon] [-mon_cfg <monitor INI path>]
-e [-tw <time in seconds>] [-ts <time in seconds>] [-l <extreme | low>]
Log File |
Description |
---|---|
|
Power stress test duration in seconds. |
|
EDP test selector. |
|
Power level selector:
If the value is not specified, the default value is High. |
|
Time duration of high power usage in the EDP test power cycle. |
|
Time duration of low power usage (idle mode) in the EDP test power cycle. |
Note
ts and tw should be multiple of 200 ms. hl_qual will round them to the nearest multiple of 200 ms.
./hl_qual -gaudi -c all -rmod parallel -t 40 -e -l extreme -tw 2000 -ts 2000
The above command line executes the EDP test for 40 seconds and runs 10 power cycles. Each power cycle runs 2 seconds of high power usage and 2 seconds of low power usage.