Output Products

The default profiler output file is default_profiling_<pid#>.hltv. This is a parsed file of device trace and host API function calls, which can be viewed in the HLTV viewer.

  • Host profiler output name format: <sessionName>[_<timestamp>]_<pid>.<ext>

    • <sessionName> - Session name is default_profiling, unless otherwise configured.

    • <timestamp> - Timestamp appears if the character ‘#’ is included in the session name. Timestamp format is YYYYMMDD_hh-mm-ss.

    • <pid#> - Represents the process id. It always appears for .hltv files, but for .json/.csv it should be enabled in the profiler’s configuration file.

    • <ext> - Depends on the output format set in the profiler’s configuration (json, hltv, csv).


Host profiler includes HW Trace data since “Merge into host profiler” is enabled by default in the HW Trace Plugin’s configuration.

  • HW Trace output name format: <sessionName>_<deviceName>_<funcInvokeNum>_<pci>.<ext>

    • <sessionName> - Session name is default_profiling, unless otherwise configured.

    • <ext> - Depends on the output format set in the profiler’s configuration (json, hltv, csv).

    • <deviceName> - Represents the device name - eg. hl0, hl1, hl2.

    • <funcInvokeNum> - Depends on the configured profiling phase:

      • Multi enqueue profiling - Refers to the relevant range that is profiled, as set in the profiler’s configuration file, e.g: “default_profiling_hl0_15-19.hltv”.

      • Single enqueue profiling - Refers to the number of relevant enqueue which is profiled, eg: e.g “default_profiling_hl0_7.hltv (profiling only the 7th enqueue).

      • Mem profiling - Refers to the number of relevant memcpy which is profiled, eg: e.g “default_profiling_hl0_7.hltv (profiling only the 7th memcpy).

      • All device acquisition profiling - This field will not appear.

    • <pci> - Represents the PCI bus ID. To see the PCI bus ID, enable “Add pci bus ID to name” in the profiler’s configuration file (disabled by default), using hl-prof-config --add-pci true.

Viewing Instructions

To view the profiling graph:

  1. Open Google Chrome or other Chromium Web Browser.

  2. Type in the address bar.

  3. Drag and drop or load the generated JSON or HLTV trace file.

After your trace file is loaded, the trace data is rendered inside HLTV on the client side, so no data is uploaded to the web server. It is also possible to install it as a PWA (Progressive Web App) on the client by pressing the small installation icon in the browser’s address bar.

Using the default configuration, the profiling results are divided into three processes: DMA, MME and TPC. The DMA shows bus monitors, while the MME and TPC show contexts. Together, this data provides a view to the execution of a recipe on the hardware and enables the viewer to quickly ascertain the cause of bottlenecks, slow performance, etc.

The DMA contains results from six bus monitors: DDR0 read and write, DDR1 read and write, and SRAM read and write. Each of the six bus monitors track bandwidth (in percentage units), latency (in cycle units), and outstanding transactions (in number of transactions) counters. Each counter shows the minimum, average and maximum values for the monitored time window. By default, the window is set to 2000 cycles.

The MME and TPC show workloads based on timestamped hardware events demarcating the beginning and end of each context. Clicking on a context shows additional information regarding the context, including the user node name, the operation kernel name, and the data type. Fig. 18 shows an example of a topology view in HLTV. Fig. 19 shows an example of the host API calls view in HLTV.


Figure 18 Full Application View with Multiple Profiled Iterations


Figure 19 Zoom in on Device Profiling View

The graphical interface is powered by the Trace Event Profiling Tool Chromium Project and the Trace-Viewer frontend for Chrome.

The viewing features are clearly documented and accessible by clicking the question mark in the top right corner. See Fig. 20 below:


Figure 20 Chrome Tracing Help Screen

One of the most useful viewing tools is the Timing Mode, enabled by pressing ‘4’. This mode allows selection by dragging the mouse from one point to another and then displays the exact time between the beginning and end of selection. See Timing Selection Example below:


Figure 21 Timing Selection Example

Events Color Scheme

By default, each event gets its color by its name. It is possible to change this, so that events will be colored by their streamHandle or recipeHandle. See Change Color Scheme Example below:


Figure 22 Change Color Scheme Example

Host-device Relations (Gaudi2 only)

It is possible to see the relations between host API calls and the executions on the device. An arrow is displayed connecting the host execution of enqueue/synLaunch API to the first related event on the device. This feature is enabled by default. If the feature was disabled, you can follow the below steps to re-enable:

  1. In hl-prof-config GUI, use the search bar to locate the “Launches relations” field, enable it, and save.

  2. In, click on ‘Filter’ and enable “Flow events”.

See HLTV Host-Device Relations Example below:


Figure 23 HLTV Host-Device Relations Example

Trace Analyzer

The trace analyzer is a built-in feature in HLTV that is meant to reduce the amount of time spent on analyzing large traces. In the bottom panel, a tab called “Trace Analyzer” contains aggregate data per operation including total duration, MME utilization and additional information. Double-clicking on a specific row switches to the “Analyzed Nodes” tab and filters it for the chosen operation.


Figure 24 Trace Analyzer

The “Analyzed Nodes” tab contains additional information for each node in the executed graph. A filter option in the top left corner of the tab is available and can filter rows by node name as well as by operation. It is also possible to sort the rows by clicking a specific column header, change the columns order by drag and drop, and hide a column by dragging it to the “Hidden Columns” box.

The order of the columns and the sort selection are saved in cookies for your next HLTV session.


Figure 25 Analyzed Nodes

Multi-Card Profiling

The output file for each device contains the ending of the device name, i.e default_profiling_<device_name>.json. The device name can be hl0, hl1 and so on. The events shown on the viewer also contain the device name, and in generated CSV a new column is added showing the device name.

Executing on Multiple Devices

Executing on multiple devices can be done in two modes:

  • Multiple Processes: Accesses every device from a different process. The data is collected separately for each process, and the collected profiling information is viewed separately for each device. Each file presents the device it was collected from. For example: TPC(hl0) - when hl0 is device 0.

  • Single Process: Access all devices from the same process. In this mode, all profiling data is viewed in a single hltv file. It is recommended to enable data compression in this mode.