Power Stress and EDP Tests Plugins Design, Switches and Parameters

This section describes plugin specific switches, however, it will not focus on the common switches although these switches will be mentioned here for the completeness of the command examples. To see the common plugin switches and parameters, refer to hl_qual Common Plugin Switches and Parameters.

Note

libpower_stress_plugin.so is a dynamically linked library implementing both Power stress and EDP tests plugins.

First-gen Gaudi Power Stress Plugin Design Consideration and Responsibilities

The power stress plugin does the following:

  • Conducts multi-level power stress test.

  • Conducts multi-level power EDP test.

The power level for both power stress and EDP tests are configurable via command line settings and aligned with the following levels:

  1. Extreme - measured power level: 345 - 355[watt]

  2. Mid – measured power level: 250 - 260[watt]

  3. Low – measured power level: 140[watt]

Note

The above numbers were achieved on an HL-205 device running at a max frequency of 1.95 GHz.

Power Stress Test Plugin

The power stress plugin running a power stress test puts the device in constant and equal level power load. The tests can run for long hours and test the following device functionalities:

  • Thermal stress test, cooling system functionality, temperature dissipation and thermal protection mechanisms can be checked while running power stress plugin in extreme load.

  • Power limiter and clock relaxation mechanisms – The power limiter is a mechanism that limits the power usage below 300 [watts]. When the power limit is met, the device clocks are lowered. To test the power limiter mechanism, the plugin must run at an extreme power level.

  • Long work periods in typical power workloads (extreme, low).

When running this test, reaching a specific power load depends on external system conditions such as the number of devices used, ambient system temperature, cooling system design, and device placement within the rack.

Note

To ensure reaching maximum power, the test should be run beyond 30 seconds.

Pass/fail Criteria

The power stress tests must run until completion without overheating or power supply failures.

Gaudi2 Power Stress Plugin Design Consideration and Responsibilities

The power stress plugin performs a max power stress on the device. There is a single power level supported, MAX - 570-630 [watts]

Power Stress Test Plugin

The power stress plugin running a power stress test puts the device in constant and equal level power load. The tests can run for long hours and test the following device functionalities:

  • Thermal stress test, cooling system functionality, temperature dissipation and thermal protection mechanisms can be checked while running power stress plugin in extreme load.

  • PID module responsible for the power management and keeping power usage below allowable MAX power of 550 [watts].

Pass/fail Criteria

  • Test mast pass all test phases with success (init, test run)

  • Test must no trigger the builtin temperature protection

  • MME engin calculation must be bit exact

Power Stress Test Plugin Switches and Parameters

First-gen Gaudi and Gaudi2 test variants differ in the test capabilities as demonstrated in the command line below:

hl_qual -gaudi -c <pci bus id> [-t <time in seconds>]  -rmod <serial | parallel>  [-dis_mon] [-mon_cfg <monitor INI path>]
         -s [-l <extreme | mid |low>] [-pl_cfg <plugin INI config path>]
  • -t - Power stress test duration in seconds. If this switch is omitted, the default value is 40 seconds.

  • -s - Power stress test selector.

  • -l <extreme | low> - Power level selector:

    • extreme - 345-355[w] measured on HL-205 running at 1.95 GHz

    • mid - 250-260 [w] measured on HL-205 running at 1.95 GHz

    • low - 140 [w] measured on HL-205 running at 1.95 GHz

If the value is not specified, the default value is low.

  • -pl_cfg <plugin INI config path> - Path to an INI configuration file.

./hl_qual -gaudi -c all -rmod parallel -s l extreme -t 120
hl_qual -gaudi2 -c <pci bus id> [-t <time in seconds>]  -rmod <serial | parallel>  [-dis_mon] [-mon_cfg <monitor INI path>]
         -s
  • -t - Power stress test duration in seconds. If this switch is omitted, the default value is 60 seconds.

  • -s - Power stress test selector.

./hl_qual -gaudi2 -c all -rmod parallel -s -t 120

EDP Test Plugin Design Consideration and Responsibilities

The EDP test verifies the functionality of the first-gen Gaudi power supply by generating a fast power usage transient from low power to high power and vice versa.

The power cycles repeat throughout the test’s execution time.

../../_images/EDP_Test_Power_Cycles.PNG

Figure 27 EDP Test Power Cycles

The test’s configurable parameters are the different power levels. The power cycle consists of high-power level usage and low-power level usage (idle state) as shown in Fig. 27. All configurable parameters are listed in EDP Stress Test Plugin Switches and Parameters.

Pass/fail Criteria

The EDP test must run until completion without overheating or power supply failures.

First-gen Gaudi EDP Stress Test Plugin Switches and Parameters

Note

The EDP test is applicable only for first-gen Gaudi based devices. Gaudi2 is not supported.

hl_qual -gaudi -c <pci bus id> [-t <time in seconds>]  -rmod <serial | parallel>  [-dis_mon] [-mon_cfg <monitor INI path>]
      -e  [-pl_cfg <plugin INI config path>]  [-tw <time in seconds>] [-ts <time in seconds>] [-l <extreme | low>]
  • -t - Power stress test duration in seconds. If this switch is omitted, the default value is 40 seconds.

  • -e - EDP test selector.

  • -l <extreme | low> - Power level selector:

    • extreme - 345-355[w] measured on HL-205 running at 1.95 GHz

    • mid - 250-260 [w] measured on HL-205 running at 1.95 GHz

    • low - 140 [w] measured on HL-205 running at 1.95 GHz

    If the value is not specified, the default value is low.

  • -tw <time in milliseconds> - Time duration of high power usage in the EDP test power cycle. The default value when this switch is not specified is 1000 [ms].

  • -ts <time in milliseconds> - Time duration of low power usage (idle mode) in the EDP test power cycle. The default value when this switch is not used is 1000 [ms].

Note

tw + ts must be smaller than the test execution time.

  • -pl_cfg <plugin INI config path> - Path to an ini configuration file.

./hl_qual -gaudi -c all -rmod parallel -t 40 -e -l extreme -tw 2000 -ts 2000

The above command line executes the EDP test for 40 seconds and runs 10 power cycles. Each power cycle runs 2 seconds of high power usage and 2 seconds of low power usage.