5. Profiler User Guide

This document describes the SynapseAI Profiling Subsystem, included in the SynapseAI® software release, and the Profiling Configuration tools as methods to configure Habana profiler:

5.1. Synapse Profiling Subsystem

5.1.1. Overview

This section describes the SynapseAI Profiling Subsystem included in the SynapseAI® software release.

The Synapse Profiling Subsystem is designed to facilitate the instrumentation of Habana hardware and software systems. The subsystem generates diagnostic information of core utilization, enabling performance analysis and optimization.

The profiler functions in three stages:

This document provides a detailed description of the operation of each stage.

5.1.2. Configuration

No configuration is required when using the profiling default settings. To configure the settings, a profiling configuration tool is included in the SynapseAI installation.

5.1.2.1. Pre-configured Instrumentation

The profiling configuration tool enables adjusting the software and hardware settings of the profiling subsystem, using either hl-prof-config, CLI based, or hl-shim-config, GUI based. The configurable settings include changing the session name, output directory, output formats and basic hardware settings of the instrumentation. After using hl-prof-config or hl-shim-config tool, you will get a new configuration file shim-config.json.

shim-config.json is stored in a hidden folder called '.habana' located in your home directory. All subsequent profiling sessions will use these settings. The settings can be reset to default using the profiling configuration tool, or by deleting the shim-config.json files located in the '.habana' directory.

Note

The synprof_configuration.json file is planned to be deprecated in the future. shim_config.json is the new configuration file, generated by hl-prof-config and a new configuration tool hl-shim-config.

5.1.2.2. CLI/GUI Configuration Tools

To configure the profiler settings, run the compatible application:

5.1.2.2.1. CLI Configuration Tool - hl-prof-config <args>

hl-prof-config <args> is a command line interface for CLI. Command line parameters must be used when calling the CLI in order to configure the Synapse Profiler operation:

hl-prof-config -h

The above displays the usage message in the terminal. To see architecture
specific settings, add the desired architecture.

Usage: hl-prof-config [-v|--version] [-h|--help] [-a] [--alt]

      If no options are given, The configuration GUI tool will be opened.

General options:
-h [ --help ]                    Produce help message
-v [ --version ]                 Print version information
-s [ --session-name ] arg        Name for output file(s). Add '#' to
                                 concatenate a runtime generated timestamp of
                                 the format YYYYMMDD_hh-mm-ss
-o [ --output-dir ] arg          Directory to store the trace output file(s)
-r [ --per-recipe ] arg          Create sub-directory for each recipe, num
                                 invocations (givven by -n) is per-recipe
-a [ --add-pci ] arg             Add device PCI-BUS id to the output file
                                 name
--simulationMode arg             Enable parsing in simulation mode (coral)
-e [ --edit-existing ] arg       Configuration file will be updated (default
                                 is overwrite)
--goya                           Target architecture is Habana Goya
--gaudi                          Target architecture is Habana Gaudi
-c [ --chip ] arg                Target architecture (goya/gaudi)

Output options:
--json arg                       Enable json format output per invocation
                                 (default off)
--json-compressed arg            Use compressed json (invoc/all - compressed
                                 file per invocation or 1 compressed file for
                                 all invocations)
--json-max-events arg            Max events per json file (-1 for infinite)
--json-overlap arg               Overlap events between consecutive json
                                 files (value in usecs)
--csv arg                        Output CSV (invoc/all - file per invocation
                                 or 1 file for all invocations)
--host arg                       Enable host API profiling (default on)
--recipe-dump-format arg         Dump recipe command buffers format (text,
                                 html, json)
--mme-advanced arg               Profile MME in advanced mode
--trace-analyzer arg             Add trace analyzer data to json
--trace-analyzer-csv arg         CreatetTrace analyzer data in CSV
--data-flow arg                  Add data flow arrows indication
--memory-reuse arg               Add memory reuse arrows indication
--no-merge arg                   Do not merge device profiles with host
                                 profile in same json output (default merged)
--fuser arg                      Add fuser metadata to the json output
                                 (default off)

Instrumentation mode options:
-p [ --phase ] arg               Profile a specific API
                                 (enq/multi-enq/act/mem/device-acq)
-i [ --instrumentation ]         Enable user profiler activation via API
                                 (disables all phases)
-g [ --invocations-range ] arg   Range of enqueue invocations to profile.
                                 Example: 1-3,4,10-12 (default 1-2)
-n [ --number-invocations ] arg  Maximal number of profiler invocations
                                 (default 2). Deprecated, use
                                 invocations-range instead
-b [ --buffer-size ] arg         Size in MB of trace buffer to allocate

To enable profiler for hardware spicific events, run the following accordingly:

 hl-prof-config -h -gaudi

Device specific options for gaudi:

Device options:

NIC:
--nic arg              Enable/Disable All NIC units trace
--nic0_0 arg           Enable/Disable NIC0 0 trace
--nic0_1 arg           Enable/Disable NIC0 1 trace
--nic1_0 arg           Enable/Disable NIC1 0 trace
--nic1_1 arg           Enable/Disable NIC1 1 trace
--nic2_0 arg           Enable/Disable NIC2 0 trace
--nic2_1 arg           Enable/Disable NIC2 1 trace
--nic3_0 arg           Enable/Disable NIC3 0 trace
--nic3_1 arg           Enable/Disable NIC3 1 trace
--nic4_0 arg           Enable/Disable NIC4 0 trace
--nic4_1 arg           Enable/Disable NIC4 1 trace

DMA IF:
--dma_if arg           Enable/Disable All DMA IF units trace
--dma_if_w_s arg       Enable/Disable DMA IF W S trace
--dma_if_e_s arg       Enable/Disable DMA IF E S trace
--dma_if_w_n arg       Enable/Disable DMA IF W N trace
--dma_if_e_n arg       Enable/Disable DMA IF E N trace

DMA CH:
--dma_ch arg           Enable/Disable All DMA CH units trace
--dma_ch0 arg          Enable/Disable DMA CH0 trace
--dma_ch1 arg          Enable/Disable DMA CH1 trace
--dma_ch2 arg          Enable/Disable DMA CH2 trace
--dma_ch3 arg          Enable/Disable DMA CH3 trace
--dma_ch4 arg          Enable/Disable DMA CH4 trace
--dma_ch5 arg          Enable/Disable DMA CH5 trace
--dma_ch6 arg          Enable/Disable DMA CH6 trace
--dma_ch7 arg          Enable/Disable DMA CH7 trace

MME:
--mme arg              Enable/Disable All MME units trace
--mme0_acc arg         Enable/Disable MME0 ACC trace
--mme0_sbab arg        Enable/Disable MME0 SBAB trace
--mme0_ctrl arg        Enable/Disable MME0 CTRL trace
--mme1_acc arg         Enable/Disable MME1 ACC trace
--mme1_sbab arg        Enable/Disable MME1 SBAB trace
--mme1_ctrl arg        Enable/Disable MME1 CTRL trace
--mme2_acc arg         Enable/Disable MME2 ACC trace
--mme2_sbab arg        Enable/Disable MME2 SBAB trace
--mme2_ctrl arg        Enable/Disable MME2 CTRL trace
--mme3_acc arg         Enable/Disable MME3 ACC trace
--mme3_sbab arg        Enable/Disable MME3 SBAB trace
--mme3_ctrl arg        Enable/Disable MME3 CTRL trace

TPC:
--tpc arg              Enable/Disable All TPC units trace
--tpc0 arg             Enable/Disable TPC0 trace
--tpc1 arg             Enable/Disable TPC1 trace
--tpc2 arg             Enable/Disable TPC2 trace
--tpc3 arg             Enable/Disable TPC3 trace
--tpc4 arg             Enable/Disable TPC4 trace
--tpc5 arg             Enable/Disable TPC5 trace
--tpc6 arg             Enable/Disable TPC6 trace
--tpc7 arg             Enable/Disable TPC7 trace

GENERAL UNITS:
--general_units arg    Enable/Disable All GENERAL UNITS units trace
--cpu arg              Enable/Disable CPU trace
--mmu arg              Enable/Disable MMU trace
--pcie arg             Enable/Disable PCIE trace
--psoc arg             Enable/Disable PSOC trace
5.1.2.2.2. GUI Configuration Tool - hl-shim-config

hl-shim-config is a GUI tool for trace collection configuration and enabling new plugins. For additional details, refer to Profiler Configuration.

To enable the profiler, follow the steps below:

  1. Select the file - open the Configuration File or the Default Configuration File:

../_images/Profiler_Configuration_Tool_Main_Window.PNG

Figure 5.1 Main Window

  1. Check the plugins you wish to enable:

../_images/Profiler_Configuration_Tool_Load_Plugin.PNG

Figure 5.2 Load Plugin

  1. Configure Host API Output Format and Internal APIs to profile:

../_images/Profiler_Configuration_Tool_Checked_Plugins.PNG

Figure 5.3 Checked plugings

5.1.3. Source Code Instrumentation

For application level activation of the profiler in user code, select API controlled option in the main window. This can be set separately for device instrumentation and host instrumentation.

For enabling device instrumentation, uncheck all the options under General settings-> Phase Options. See below Figure 5.4:

../_images/Profiler_Configuration_Tool_Select_Trace_Units_Window.PNG

Figure 5.4 Select Trace Units

This disables all automatic profiling of API calls. After saving this configuration, calls to the synProfilerStart() and synProfilerStop() APIs should be placed in the relevant places in the code. Each time a pair of start and stop calls are encountered, the profiling subsystem generates a new trace buffer. The trace buffer may then be retrieved from memory using the synProfilerGetTrace() API. The trace buffer will be available for retrieval until a new one is generated. In this context, the other phases and max invocations are irrelevant. All other settings apply both to manual and automatic profiling. Runtime usage is also the same for both, as detailed in the following section.

The trace buffer may be retrieved in memory as a C/C++ struct, and parsed in the application, or written to the file system in JSON format. If using the latter, the file will be saved according to the output settings. Further details are available in the SynapseAI API documentation.

The below is an example of using an API controlled instrumentation:

// start profiling (enables device trace modules)

status = **synProfilerStart**\ (synTraceDevice, deviceId);



// do something here e.g., enqueue

// ...

// wait on some handle for completion



// stop profiling (disables the device trace modules)

status = **synProfilerStop**\ (synTraceDevice, deviceId);



// output to file system by passing buffer=nullptr & size=nullptr

| status = **synProfilerGetTrace**\ (synTraceDevice,
| deviceId, synTraceFormatTEF, nullptr, nullptr);

5.1.4. Run-time

5.1.4.1. Usage

Enabling the profiler can be done in two modes:

  1. Set the environment variable HABANA_PROFILE=1 with/without HABANA_SHIM_CONFIG:

export HABANA_PROFILE=1

Setting this environment variable allows the Synapse run-time library to enable the profiling library during initialization. The profiling library engages the hardware instrumentation and the application API software instrumentation which enables API call profiling and Traces from HW by default. It may use a configuration file from (~/.habana), or a specified configuration file using: HABANA_SHIM_CONFIG.

  1. Set the environment variable HABANA_SHIM_CONFIG=<shim_config.json> only:

export HABANA_SHIM_CONFIG=<shim_config.json>

Setting this environment variable without HABANA_PROFILE=1 loads a given configuration file. It enables only the specified plugins in the configuration file.

Note

For TensorFlow Keras usage, to ensure the profiler data is created correctly, make sure to add the code at the end of the model keras.backend.clear_session(). You will notice that the profiler post-processing requires some time at the end of the model execution.

An example of configuring the profiler to capture the 1-100th enqueue:

hl-prof-config -gaudi -e off -g 1-100

Parameters:

  • -gaudi - Target architecture is Habana® Gaudi®.

  • -e off - Indicates that the hl-prof config file will be overwritten so that the profiler configuration will only include what is configured by this command when run.

  • -g 1-100 - Will profile the 1-100th enqueue. Please note that the larger number of enqueues that get profiled, the longer profiler post processing will take. The profiling file will also take up more storage.

Note

At the end of the run, you might experience long wait times for profiler to post process. If these wait times are spanning too long, try reducing the profiling span.

5.1.4.2. Effect on Performance

You can enable profiling for the device and/or host:

  • Host profiling has negligible impact on the overall application performance, and no impact on device performance.

  • Device profiling may add to run-time in the aspects detailed below.

5.1.4.3. Device Profiling Prolog and Epilog

The hardware trace components are almost completely non-intrusive. However, the enabling, disabling and collection of data adds some host CPU overhead to the overall run-time. This means that the overall time of the application can be expected to increase, although the performance of the device components will not be affected, or only slightly affected in certain scenarios.

5.1.4.4. DRAM Bandwidth

The profiling tool utilizes a small amount of DRAM bandwidth, which can slow down topologies that depend heavily on DRAM bandwidth. The worst case is a theoretical 12.5% slowdown, and in practice 0-5% was observed, depending on the workload.

5.1.4.5. Enqueue Pipelining

When using automatic instrumentation, the profiler is enabled and disabled for each enqueue (launch). In this case, each enqueue is executed in isolation. Therefore, certain parallelization which can be achieved by pipelining enqueues is disabled. Profiling multiple pipelined enqueues is possible using the manual instrumentation mode while surrounding the relevant user code with profiling start and stop API calls.

5.1.5. Analysis

5.1.5.1. Output Products

The default profiler output file is default_profiling.json. This is a parsed JSON file of device trace and host API function calls, which can be viewed in the HLTV viewer. The profiler’s output files which are not written by default are:

  • default_profiling_[<serial#>].json - Per-synLaunch(enqueue) parsed JSON file of device trace, for viewing in the HLTV viewer. Files per synLaunch(enqueue) are generated in case host profiling is disabled.

  • default_profiling_host.json - JSON representation of the API function calls for host application profiling, in case device profiling is disabled.

Key note – when enabling --json-compressed, it is required to define how many events are viewable in a zoomed section, this can be done by adding also --json-max-events in hl-prof-config or using the GUI tool ( recommended value is 500,000).

Notes:

Full output file name format is: <sessionName>[_<timestamp>][_deviceId#][_<serial#>]

  • sessionName - Session name is default_profiling, unless otherwise configured.

  • _<timestamp> - Timestamp appears if the character ‘#’ is included in the session name. Timestamp format is YYYYMMDD_hh-mm-ss.

  • _deviceId# - The device ID is included if the device profiled is not identified as hl0, e.g. hl1, hl2, hl3 etc. In the case of device 0, the deviceId will not be included in the output filename.

  • _<serial#> - Serial number is the number of the invocation in the current session. By default, profiling is enabled for the first two Synapse synLaunch API calls committed by the application. Subsequent calls will not be traced.

5.1.5.2. Viewing Instructions

To view the profiling graph:

  1. Open Google Chrome or other Chromium Web Browser.

  2. Type https://hltv.habana.ai in the address bar.

  3. Drag and drop or load the generated JSON file.

HLTV (Habana Labs Trace Viewer) is a web service based on the chrome://tracing mechanism but with specific functionality added for Habana Labs tracing. The trace data is rendered inside HLTV on the client side, so no data is uploaded to the web server. It is also possible to install it as a PWA (Progressive Web App) on the client by pressing the small installation icon in the browser’s address bar.

Using the default configuration, the profiling results are divided into three processes: DMA, MME and TPC. The DMA shows bus monitors, while the MME and TPC show contexts. Together, this data provides a view to the execution of a recipe on the hardware and enables the viewer to quickly ascertain the cause of bottlenecks, slow performance, etc.

The DMA contains results from six bus monitors: DDR0 read and write, DDR1 read and write, and SRAM read and write. Each of the six bus monitors track bandwidth (in percentage units), latency (in cycle units), and outstanding transactions (in number of transactions) counters. Each counter shows the minimum, average and maximum values for the monitored time window. By default, the window is set to 2000 cycles.

The MME and TPC show workloads based on timestamped hardware events demarcating the beginning and end of each context. Clicking on a context shows additional information regarding the context, including the user node name, the operation kernel name, and the data type. Figure 5.5 shows an example of a topology view in chrome://tracing. Figure 5.6 shows an example of the host API calls view in chrome://tracing.

../_images/fig3_HLTV_Full_Application_View.png

Figure 5.5 Full Application View with Multiple Profiled Iterations

../_images/Zoom_in_on_Device_Profiling_View.PNG

Figure 5.6 Zoom in on Device Profiling View

The graphical interface is powered by the Trace Event Profiling Tool Chromium Project and the Trace-Viewer frontend for Chrome.

The viewing features are clearly documented and accessible by clicking the question mark in the top right corner. See Figure 5.7 below:

../_images/Fig5_HLTV_Tracing_Help.png

Figure 5.7 Chrome Tracing Help Screen

One of the most useful viewing tools is the Timing Mode, enabled by pressing ‘4’. This mode allows selection by dragging the mouse from one point to another and then displays the exact time between the beginning and end of selection. See Timing Selection Example below:

../_images/fig6_HLTV_Timing_Selection.png

Figure 5.8 Timing Selection Example

5.1.5.3. Trace Analyzer

The trace analyzer is a built-in feature in HLTV that is meant to reduce the amount of time spent on analyzing large traces. In the bottom panel, a tab called “Trace Analyzer” contains aggregate data per operation including total duration, MME utilization and additional information. Double-clicking on a specific row switches to the “Analyzed Nodes” tab and filters it for the chosen operation.

../_images/fig7_Trace_Analyzer_tab.png

Figure 5.9 Trace Analyzer

The “Analyzed Nodes” tab contains additional information for each node in the executed graph. A filter option in the top left corner of the tab is available and can filter rows by node name as well as by operation. It is also possible to sort the rows by clicking a specific column header, change the columns order by drag and drop, and hide a column by dragging it to the “Hidden Columns” box.

The order of the columns and the sort selection are saved in cookies for your next HLTV session.

../_images/fig8_Analyzed_Nodes_tab.png

Figure 5.10 Analyzed Nodes

5.1.5.4. Multi-Card Profiling

The output file for each device contains the ending of the device name, i.e default_profiling_<device_name>.json. The device name can be hl0, hl1 … The events shown on the viewer also contain the device name, and in generated CSV a new column is added showing the device name.

5.1.5.4.1. Executing on Multiple Devices

Executing on multiple devices can be done in two modes:

  1. Multiple Processes:

Accesses every device from a different process. The data is collected separately for each process, and the collected profiling information is viewed separately for each device. Each file presents the device it was collected from. For example: TPC(hl0) - when hl0 is device 0.

  1. Single Process:

Access all devices from the same process. In this mode, all profiling data is viewed in a single hltv file. It is recommended to enable data compression in this mode.

5.1.6. Profiling Tips and Tricks

This section explains the most common options to create a configuration file in various modes for data collection purposes.

  • Configure data collection per enqueue:

hl-prof-config --gaudi --buffer-size 256 --trace-analyzer on  --trace-analyzer-csv on --phase enq -e off --json on --json-compressed all --json-max-events 400000

This command generates a configuration file for profiling per enqueue mode. It enables dump for several files: JSON per enqueue (each enqueue is numbered), and one large hltv file that contains all enqueues (not numbers). In this mode, trace is collected after every enqueue. The enqueues are executed in a serial mode where each enqueue is executed once the previous one ends. Since the trace buffer is limited in size, for initial trace collection, it is recommended to check the desired enqueues rather than collecting specific enqueues in a multi-enq mode. (See the below note)

  • Configure data collection in a set of enqueues:

hl-prof-config --gaudi --buffer-size 256 --trace-analyzer on  --trace-analyzer-csv on --phase multi-enq --invocations-range 1-100 -e off --json-max-events 800000 --json-compressed all

This command generates a configuration file for profiling in a multi enqueue mode. It also enables a dump for a CSV and JSON files for specified enqueues. Set enqueues 10 to 13. This command generates trace analyzer data to provide further understanding of each executed node. (See the below note)

Note

The above two commands generate trace analyzer data to provide further understanding of each executed node.

  • Configure data collection per iteration (PyTorch, TensorFlow and profile from synapse API):

hl-prof-config --gaudi --host on --buffer-size 256 --trace-analyzer on  --trace-analyzer-csv on --csv all -i -e off

This command generates a configuration file for profiling trace data from API level using:

  • synProfilerStart

  • synProfilerStop

  • synProfilerGetTrace - collect trace data and dump to a file:

    To dump a file, call synProfilerGetTrace(synTraceAll, 0x0, synTraceFormatTEF, nullptr, nullptr, nullptr).

Note

The above API calls collect the trace from Python API.

To simplify the usage of the above API calls (PyTorch and TensorFlow), set the following environment variables:

  • HABANA_SYNAPSE_LOGGER=hw

  • TF_HW_PROFILE_RANGE=<first_itration>:<last_iteration> e.g. TF_HW_PROFILE_RANGE=5:9

  • HABANA_PROFILE=1

  • TF_HOOK_MODE=all

Note

The above environment variables collect the trace per iteration in a Python environment.

  • To enable output compression, use a new flag:

hl-prof-config --gaudi -host on --buffer-size 256 --trace-analyzer on  --trace-analyzer-csv on -i -e off --json-compressed all --json-max-events 800000

This command enables commpressing output <default_profiling.hltv>. HLTV file format is applicable in hltv viewer. Use the above configuration when the generated json file is larger than 300MB. The compressed format allows a certain level of details view. When zooming in, more detailed information will be viewed. Flag --json-max-events defines how many events will be viewed in the detailed level when zoomed in. In case the generated hltv file cannot be opened, reduce the number specified by -json-max-events.

5.2. Profiler Configuration

This section describes the following tools as methods to configure Habana profiler:

Note

shim_ctl does not add any functionality on top of hl-shim-config, and is recommended for use only in case where a CLI is required, such as scripting.

5.2.1. Terms

  • Plugin: a library implementing a particular task during execution of a API call.

  • Field: a configurable parameter in the plugin configuration. Its default value is stored in the plugin’s library.

  • Group: a collection of fields, and/or other group(s) in the plugin configuration.

  • Scheme: each plugin has its own scheme. A json structured data representing the hierarchy of groups and fields of the configuration of that plugin. The json object can be illustrated as a tree: the leaves are the fields; and the non-leave nodes are groups. Each node has an ID. The leaves are the fields, and the non-leave nodes are groups. Each node has an ID. For each field, the scheme contains its properties; which GUI widget to use (Checkbox/Drop down menu/ Text field, etc) and its default value.

  • Profiler Configuration File: a JSON file containing an array of plugins. Per plugin, the following properties should exist:

    • name - The name of the plugin as it will be displayed.

    • lib - Which library is used for this plugin.

    • enable - true/false; controls whether the plugin is enabled or not (optional, default: false).

  • values - json object contains the differences from the default configuration. Any configuration value which does not exist in the configuration file will be as the default value written in the plugin’s scheme.

Note

Although entries in values should be different from their default values, entries might be equal to their defaults, and it will not be treated as a misformed configuration file.

  • Default Configuration File: ~/.habana/shim_config.json. When a Synapse program starts, this file (by default) is parsed and its content determines which plugins are enabled, and with what configuration. In order to make a Synapse application use a different configuration file, and set the environment variable HABANA_SHIM_CONFIG=<path-to-config file> before running the app.

5.2.2. hl-shim-config

hl-shim-config is a graphical application, allowing to view, edit, and save profiler configuration files in a visual way.

5.2.2.1. Prerequisites

shim_ctl is necessary on the system because hl-shim-config uses it in the background. It is assumed that shim_ctl exists in PATH, however, if it is not, the environment variable SHIM_CONFIG_PATH=<path to shim_config> hl-shim-config can be set.

5.2.2.2. Usage

To use hl-shim-config application, see the figure and options below:

../_images/welcome.png

Figure 5.11 Welcome Screen

  1. File Menu

  2. Help menu

  3. Plugins Selection Screen

  4. Plugins Configuration Screen

  5. General Settings Screen

  6. Reset Configuration Button

5.2.2.2.1. File Menu

The file menu allows loading a configuration file, and saving the current configuration to a file. Under the file menu, the following options are availabe:

  • Open Configuration File: Browse dialog will open to choose an existing configuration file and open it.

  • Open Default Configuration File: The default configuration file of the profiler is located at ~/.habana/shim_config.json. Clicking this option will open it without showing a browse dialog.

  • Save: Save the current configuration to the loaded file.

  • Save As: Browse dialog will be opened, allowing you to choose which file should store the current configuration.

  • Save to Default Configuration File: Save current configuration to ~/.habana/shim_config.json without showing a browse dialog.

Note

Only after opening a file, saving options and other screens will be available.

5.2.2.2.2. Help Menu

To Navigate back to the opening screen, click the “Welcome” button.

../_images/welcome_screen2.png

Figure 5.12 Welcome Screen

5.2.2.2.3. Plugins Selection Screen

This section describes the available plugins. Upon opening a configuration file, the ‘Select Plugins’ screen is shown, and the user can select the spicific plugins to enable, and set their execution order (see Figure 5.13).

../_images/select_plugins.png

Figure 5.13 Plugins Selection Screen

  1. Loaded file indicator.

  2. Available Plugins - Displays a list of available plugins in the system. The checkbox determines whether a specific plugin is enabled or not. Clicking the ‘spanner’ icon will navigate you to the plugin’s configuration view. Clicking on a plugin name will show you information in the ‘Plugin Information’ section (see below).

  3. Plugin Information - Displays additional details about the selected plugin.

  4. Restore Defaults - Clicking this button will set all the plugin’s parameters back to their default values.

  5. Plugins Sequence - View and adjust the order of the plugins’ execution in the profiler. You can move a plugin by dragging it or by selecting it and using the arrows (to select multiple plugins press the ‘ctrl’ key).

Note

If Host Profiler plugin is enabled, it is recommended to position it last.

5.2.2.2.4. Plugins Configuration Screen

This section explains the plugin parameters. Upon selecting plugins, the user can adjust the selected plugin parameters (see Figure 5.14).

../_images/plugins_config.png

Figure 5.14 Plugins Configuration Screen

  1. Search - The search is case-insensitive, and is done on:

    • Group names

    • Field names

    • Drop-down menu options

  2. Plugin tabs - Each plugin will appear on a separate tab. Click on the ‘Plugin tab’ to view its configuration in #3 and #4.

  3. Tree view of the plugin configuration - Click on a group to expand it, and view its content in #4.

  4. View group and its parameters and subgroups - This view allows you to modify parameters.

  5. Show Advanced parameters switch - If some groups/fields were marked in the scheme as ‘advanced’, they will be hidden by default. Turn on this switch if you would like to unhide them.

  6. Save Button - Click to save the current configuration.

  7. Undo and Redo Buttons - Click to undo or redo changes in the configuration.

  8. Multiple Selection - To select multiple profile units or engines, hold down the Control (Ctrl) key while clicking on more than one profile unit. Alternatively, the user can select a range of profile units by holding the Shift key. The displayed fields are the common denominator between the selected profile units. Upon multiple selection, the user can change a field value across multiple profile units. In addition, the user can select all events by clicking on ‘Select All’ checkbox, which will select all the events, even the ones that are not displayed. On the other hand, the user can click on ‘Select All Displayed’, which will select only the fields being displayed. The multiple selection features an indeterminate state as well for values that are not the same across the selected profile units.

Keyboard Shortcuts

To facilitate easier use of hl-shim-config, a few key combinations were added enabling the execution of well-known commands as described below:

Save File

Ctrl + S

Save as File

Ctrl + Shift + S

Undo

Ctrl + Z

Redo

Ctrl + Shift + Z

Navigate to search box

Ctrl + F

5.2.2.2.5. General Settings Screen

This section contains global profiler parameters, such as output directory and profiling session name:

../_images/general_settings.png

Figure 5.15 General Settings Screen

5.2.2.2.6. Reset Configuration Button

Clicking this button will restore all values to default configuration:

../_images/reset_settings.png

Figure 5.16 Reset Settings Screen

5.2.3. shim_ctl

shim_ctl is a command-line tool used to configure and retrieve the status of Habana profiler and its different plugins.

5.2.3.1. Features

  • Get a list of available plugins. Per plugin, you can choose to retrieve the following:

    • Plugin’s basic information.

    • Plugin’s scheme.

    • Plugin’s values overriding the default values.

    • Get a merge of the above (when taking the plugin’s scheme and applying the set values on top of it).

  • Get a list of running processes using profiler.

  • Edit a configuration file (a specific file, or the default file).

  • Enable/ disable the list of plugins.

  • Set any field that appears in the scheme.

  • Reset to default.

5.2.3.2. Command Line Options