Analysis

Output Products

The default profiler output file is default_profiling.json. This is a parsed JSON file of device trace and host API function calls, which can be viewed in the HLTV viewer. The profiler’s output files which are not written by default are:

  • default_profiling_[<serial#>].json - Per-synLaunch(enqueue) parsed JSON file of device trace, for viewing in the HLTV viewer. Files per synLaunch(enqueue) are generated in case host profiling is disabled.

  • default_profiling_host.json - JSON representation of the API function calls for host application profiling, in case device profiling is disabled.

Notes:

Full output file name format is: <sessionName>[_<timestamp>][_deviceId#][_<serial#>]

  • sessionName - Session name is default_profiling, unless otherwise configured.

  • _<timestamp> - Timestamp appears if the character ‘#’ is included in the session name. Timestamp format is YYYYMMDD_hh-mm-ss.

  • _deviceId# - The device ID is included if the device profiled is not identified as hl0, e.g. hl1, hl2, hl3 etc. In the case of device 0, the deviceId will not be included in the output filename.

  • _<serial#> - Serial number is the number of the invocation in the current session. By default, profiling is enabled for the first two Synapse synLaunch API calls committed by the application. Subsequent calls will not be traced.

Viewing Instructions

To view the profiling graph:

  1. Open Google Chrome or other Chromium Web Browser.

  2. Type https://hltv.habana.ai in the address bar.

  3. Drag and drop or load the generated JSON file.

HLTV (Habana Labs Trace Viewer) is a web service based on the chrome://tracing mechanism but with specific functionality added for Habana Labs tracing. The trace data is rendered inside HLTV on the client side, so no data is uploaded to the web server. It is also possible to install it as a PWA (Progressive Web App) on the client by pressing the small installation icon in the browser’s address bar.

Using the default configuration, the profiling results are divided into three processes: DMA, MME and TPC. The DMA shows bus monitors, while the MME and TPC show contexts. Together, this data provides a view to the execution of a recipe on the hardware and enables the viewer to quickly ascertain the cause of bottlenecks, slow performance, etc.

The DMA contains results from six bus monitors: DDR0 read and write, DDR1 read and write, and SRAM read and write. Each of the six bus monitors track bandwidth (in percentage units), latency (in cycle units), and outstanding transactions (in number of transactions) counters. Each counter shows the minimum, average and maximum values for the monitored time window. By default, the window is set to 2000 cycles.

The MME and TPC show workloads based on timestamped hardware events demarcating the beginning and end of each context. Clicking on a context shows additional information regarding the context, including the user node name, the operation kernel name, and the data type. Fig. 18 shows an example of a topology view in chrome://tracing. Fig. 19 shows an example of the host API calls view in chrome://tracing.

../../_images/fig3_HLTV_Full_Application_View.png

Figure 18 Full Application View with Multiple Profiled Iterations

../../_images/Zoom_in_on_Device_Profiling_View.png

Figure 19 Zoom in on Device Profiling View

The graphical interface is powered by the Trace Event Profiling Tool Chromium Project and the Trace-Viewer frontend for Chrome.

The viewing features are clearly documented and accessible by clicking the question mark in the top right corner. See Fig. 20 below:

../../_images/Fig5_HLTV_Tracing_Help.png

Figure 20 Chrome Tracing Help Screen

One of the most useful viewing tools is the Timing Mode, enabled by pressing ‘4’. This mode allows selection by dragging the mouse from one point to another and then displays the exact time between the beginning and end of selection. See Timing Selection Example below:

../../_images/fig6_HLTV_Timing_Selection.png

Figure 21 Timing Selection Example

Trace Analyzer

The trace analyzer is a built-in feature in HLTV that is meant to reduce the amount of time spent on analyzing large traces. In the bottom panel, a tab called “Trace Analyzer” contains aggregate data per operation including total duration, MME utilization and additional information. Double-clicking on a specific row switches to the “Analyzed Nodes” tab and filters it for the chosen operation.

../../_images/fig7_Trace_Analyzer_tab.png

Figure 22 Trace Analyzer

The “Analyzed Nodes” tab contains additional information for each node in the executed graph. A filter option in the top left corner of the tab is available and can filter rows by node name as well as by operation. It is also possible to sort the rows by clicking a specific column header, change the columns order by drag and drop, and hide a column by dragging it to the “Hidden Columns” box.

The order of the columns and the sort selection are saved in cookies for your next HLTV session.

../../_images/fig8_Analyzed_Nodes_tab.png

Figure 23 Analyzed Nodes

Multi-Card Profiling

The output file for each device contains the ending of the device name, i.e default_profiling_<device_name>.json. The device name can be hl0, hl1 … The events shown on the viewer also contain the device name, and in generated CSV a new column is added showing the device name.

Executing on Multiple Devices

Executing on multiple devices can be done in two modes:

  • Multiple Processes: Accesses every device from a different process. The data is collected separately for each process, and the collected profiling information is viewed separately for each device. Each file presents the device it was collected from. For example: TPC(hl0) - when hl0 is device 0.

  • Single Process: Access all devices from the same process. In this mode, all profiling data is viewed in a single hltv file. It is recommended to enable data compression in this mode.