3. TPC Tools Debugger

3.1. Introduction

This document describes how to install and use the habanalabs-tpcdebug Visual Studio Code extension, which provides a front-end for debugging TPC kernels running in simulation.

The TPC Tools package provides all the components required for developing TPC kernels including a compiler, TPC simulation library and TPC test core library. The TPC test core library enables writing a TPC test program which loads and invokes a TPC kernel in simulation. The TPC simulation library includes a debugger back-end which communicates with the Visual Studio Code TPC extension and provides step level debugging interface in both disassembly and TPC-C source level.

This document does not describe how to write a TPC test program. It describes how to use the debugger for debugging TPC kernels running in an existing TPC test program.

The following figure contains the components of a TPC test program and its connection to a Visual Studio Code’s debug session. The Visual Studio Code communicates with the TPC simulator’s debugger back-end using DAP (Debug Adapter Protocol) over TCP/IP port. The TPC test program can be run on a local or remote machine.

../_images/tpc_debug_diagram.png

Figure 3.11 Active TPC Debug Session

3.2. Installation

The TPC debugger Visual Studio Code extension can be installed on any platform (Linux or Windows). Follow the below steps:

  1. Obtain a copy of the latest TPC debugger extension, provided by Habana - habanalabs-tpcdebug-<version>.vsix.

  2. Run Visual Studio Code.

  3. Hit F1 to open the Command palette search box and type “Install from VSIX”.

  4. Hit Enter, navigate to and select the vsix file provided by Habana.

  5. Hit Ctrl-Shift-X to open the extensions view and check that “Habanalabs TPC debugger” is installed and enabled:

../_images/tpc_debug_install.png

3.3. Starting a Debug Session

Starting a debug session requires running the TPC test program and starting a debug session from Visual Studio Code that will attach to the TPC test program. You can either start the TPC test program manually and request Visual Studio Code to attach to it, or you can request Visual Studio Code to start the test program before attaching to it.

Starting a Visual Studio Code debug session requires adding a new launch configuration in the launch.json file located under your working folder. The launch configuration for the TPC debug extension will be different depending on whether it needs to attach to a running TPC test program or if the TPC test program should be started by the debug session.

To add the default launch configuration, refer to Add default launch configuration. By default, the steps outlined in this section is suitable for attaching to a running TPC test program. After adding the default launch configuration, refer to one of the specific setup sections to configure it according to your preferred method of operation:

3.3.1. Add Default Launch Configuration

  1. Open the “Run and Debug” tab in Visual Studio Code by either clicking on the “Play+Bug” icon on the left side bar or by using the “Ctrl+Shift+D” keyboard shortcut.

  2. If no folder opens, you will see a blue “Run and Debug” button and with a link named “Open a folder and create launch.json” below it. In this case, click on the “Open a folder” link and choose an existing folder to open or create a new folder. After the folder is opened, hit “Ctrl+Shift+D” again to re-open the “Run and Debug” tab.

  3. If a new folder is created, click the “create launch.json file” link and choose the “Select Environment” option menu, then select “Habanalabs TPC Debugger” to create a launch.json file. This will create a launch.json file and will add the default launch configuration as described in step 5. (Skip step 4)

  4. If a launch.json file already exists in the folder, press the “Option” menu located on the right side of the green “Play” button and choose “Add Configuration…”:

    ../_images/tpc_debug_add_config.png

    In the pop-up window, choose “Habanalabs TPC Debug: Launch”. This will add the default launch configuration.

  5. You should see the following entry added to the launch.json file

    {
        "type": "tpc_debugger",
        "name": "TPC Attach",
        "request": "launch",
        "remote_host": "${command:AskForHostName}",
        "remote_port": "${command:AskForPortNum}"
    },
    

3.3.2. Manually Run the TPC Test Program

Open a command shell on the target machine, set the environment variable TPC_VSCODE_DEBUG=1 and run the TPC test program. After the test program starts running, at the first TPC invocation the program will stop and wait for the Visual Studio Code debug session to attach to it as in the following example:

$ TPC_VSCODE_DEBUG=1 ~/builds/tpc_kernels_release_build/tests/tpc_kernel_tests ~/builds/tpc_kernels_debug_build/tests/tpc_kernel_tests –gtest_filter=sanity/Gaudi2ReduceProdF32Test5D.*

Note: Google Test filter = sanity/Gaudi2ReduceProdF32Test5D.* [==========] Running 20 tests from 1 test case. [———-] Global test environment set-up. [———-] 20 tests from sanity/Gaudi2ReduceProdF32Test5D [ RUN ] sanity/Gaudi2ReduceProdF32Test5D.gaudi2_reduce_prod_fwd_f32/ifm_DxWxHxBxA_64x6x2x2x1_dim0_perm_offset_0 Habana Labs TPC simulator Library - version 0.15.0.22a699f3 Dap server Waiting on port 4710

As shown in the above example, the debugger back-end will wait for connection on TCP port 4710 (default port). A different port number can be used by setting the TPC_VSCODE_DEBUG environment variable to the desired port number instead to 1.

While the program waits for the debugger to attach to it, in Visual Studio Code “Run and Debug” panel, select the default “TPC Attach” launch configuration that was previously added (see: Add default launch configuration) and hit F5 to start a debug session.

When using the default launch configuration, an input box will appear. Specify the host name and then the port number of the target machine on which the TPC test program is running in that input box. After it connects, the debug session will start and the TPC kernel will be stopped on the first instruction.

To save the host name and port number, modify the launch configuration in one of two ways:

  • For remote host debugging, specify the host name and port number in the “remote_host” and “remote_port” arguments of the configuration:

    {
        "type": "tpc_debugger",
        "name": "TPC Attach",
        "request": "launch",
        "remote_host": "my_target_hostname"
        "remote_port": "4710"
    },
    
  • For local host debugging on the default port, comment out or completely remove the “remote_host” and “remote_port” arguments:

    {
        "type": "tpc_debugger",
        "name": "TPC Attach",
        "request": "launch",
        //"remote_host": "${command:AskForHostName}",
        //"remote_port": "${command:AskForPortNum}"
    },
    

3.3.3. Automatically Launch TPC Test Program on a Local Machine

You can change the launch configuration to run the TPC test program automatically when a new debug session is initialized. For this, the command line of the TPC test program should be specified in the launch configuration.

Change the default launch configuration that was added previously (See: Add default launch configuration) and add “program” and (optionally) “args” attributes to specify the command and command arguments of the TPC test program.

You can also change the configuration name from “TPC Attach”. See the following example:

{
    "type": "tpc_debugger",
    "name": "TPC Test Run",
    "request": "launch",
    "program": "${env:HOME}/builds/tpc_kernels_release_build/tests/tpc_kernel_tests",
    "args": [
        "--gtest_filter=sanity/Gaudi2ReduceProdF32Test5D.*"
    ]
}

When selecting the above “TPC Test Run” configuration in the “Debug and Run” panel and pressing the green “Play” button, or F5, a new terminal will be added to the “Terminal” view at the bottom of Visual Studio Code window and the TPC test program will be launched in this terminal. Once the program invokes the first TPC kernel invocation, vscode will attach to it and the debug session will start. The TPC kernel execution will be stopped on the first kernel instruction.

3.3.4. Automatically Launch TPC Test Program on Remote Machine

In order to set up a Visual Studio Code debug session to run and debug a TPC test program on a remote machine, the “Remote - SSH” extension of Visual Studio Code should be used. This extension allows Visual Studio Code to connect to a remote machine through ssh connection for remote Run and Debug sessions.

  1. If “Remote - SSH” extension is not yet installed, press “Ctrl+Shift+X” in Visual Studio code to go the the extensions view. Type “Remote SSH” in the search text box and select the extension to install.

  2. Press “F1” to open the command palette and type “Remote-SSH: Connect to host…” and Enter.

  3. You will be asked to add a new remote host configuration and specify the host name and username to use for the SSH connection. Once added, connect to the host (a password may be requied). Once the remote host is connected, a new Visual Studio Code window will appear. In this window, all operations (e.g., “Open Folder”) are done on the remote host. Only the GUI part is visible locally.

  4. Press “Ctrl+Shift+X” in the new Visual Studio Code window to open the extensions view. In this view, you will see two lists of extensions, the local installed extensions and remote host extensions. You need to make sure that “Habanalabs TPC Debugger” extension is installed on the remote machine. If it is installed locally but not on the remote host, you will see a button inside the local extension bullet that allows you to copy and install it on the remote machine.

  5. Follow the same steps described in `Automatically launch TPC test program on local machine`_. In this window, you work as if Visual Studio Code is running on the remote machine.

3.4. TPC-C Source or Disassembly Level Debugging

If the TPC Kernel was compiled with debug information, specifying -g flag in the tpc-clang command line, debug information will be available for the debugger. In this case, the default debugging mode will be source level debugging.

If the TPC Kernel was not compiled with debug information, then only disassembly level debugging will be available.

3.4.1. TPC-C Source Level Debugging Mode

  1. The Source View displays the TPC source code and the line number corresponding to the current instruction address is marked.

  2. When setting a breakpoint on a source line, see Breakpoint set/unset, a breakpoint will be set on that line if some TPC instruction in the kernel is flagged to be the start of that source line according to the debug information. If the start of the source line is not flagged, a breakpoint will be set on the next line number in the same source file if the above conditions are met.

  3. Pressing the “Step-Over” button will cause the kernel to continue execution until a kernel instruction which is the first instruction of a source line.

  4. TPC-C level variables and their current value will be displayed under the “Variables” node in Variables View

3.4.2. Disassembly Level Debugging Mode

  1. The Source View displays the disassembly of the executing kernel and the next kernel instruction to be executed is marked.

  2. When setting a breakpoint, see Breakpoint set/unset, a breakpoint will be set on that line or the first line after that line which is a start of a new kernel instruction.

  3. Pressing on the “Step-Over” button will cause the kernel to continue execution until the next kernel instruction.

3.4.3. Switching Between Source Level and Disassembly Debugging Modes

When both TPC-C source level and disassembly debug level modes are available for the compiled kernel, toggling between the two modes can be done by right clicking the Source View and choosing the “Toggle Disassemble” button.

3.4.4. Source Level Debugging Limitations

The quality of the debug information depends on the optimization level the kernel was compiled with. To debug a specific object at source level, it is recommended to build with -O0 or -O1 compilation switches. This enables more accurate debug information by adding nops to avoid pipeline issues. Building with -O2 compilation switch creates more compact and efficient code, at the price of lower quality debug information.

3.5. Debug Session Views and Operations

The following figure shows the main view area in the Visual Studio Code window when a TPC debug session is started.

../_images/tpc_debug_window.png

Figure 3.12 Main View Areas

Each area is described in more detail in the following sections:

  1. The main source or disassemble view. See Source View.

  2. The current kernel position indication and breakpoint enable/disable area. See Breakpoint set/unset.

  3. Variables view area showing current values of program variables and TPC registers. See Variables View.

  4. Static tensor information. See Tensor Info View.

  5. Debug start/stop and stepping control. See Debug Start/Stop/Step.

  6. Last executed instruction operands view. See Instruction Operands View.

  7. Debug Console. See Debug Console.

  8. Current TPC invocation number in the status bar. See Debug Start/Stop/Step.

3.5.1. Source View

This is the main area where the current program instruction is shown in either disassembly or TPC-C source level.

3.5.1.1. Breakpoint set/unset

When hovering over the left side of the source view, marked with yellow arrow number 2 in The main view areas, a dark red dot is visible. When you click on the red dot, a breakpoint is toggled to be enabled or disabled on that line. When setting a breakpoint on source or disassembly line, the actual breakpoint may be set on a different line if the selected line is not a start of instruction as indicated by the debug information, or the selected line does not point to a TPC instruction in disassembly mode. The breakpoint will be set on the next line after the selected line which is a start of a new instruction.

When in TPC-C source level debug mode and the TPC kernel was compiled with optimizations, it is possible that not every source line can be a breakpoint target. This is because of the VLIW architecture of the TPC. The compiler may optimize two different source commands from different source lines to be combined to be executed in parallel on the same TPC instruction.

See Source level debugging limitations

3.5.2. Variables View

The Variables View area on the top left side of the debug window displays the current value of program variables and TPC registers as described in the following sections.

3.5.2.1. Variables

This section will appear only when source level debugging is supported for the debugged TPC kernel, i.e., it was compiled with debug information.

It shows the current value of source level program variables.

Depending on the quality of the debug information data (See Source level debugging limitations), the debugger may not have the backing location for all program variables at each program location. When a value of a program variable cannot be determined, its value is preset as “Unknown”.

3.5.2.2. Bypass

The Bypass section displays values written to TPC registers that were not yet committed. The TPC’s pipeline and bypass architecture defines latency values for different register write operations such that when a register is written by some instruction, the new written value will be visible to other instructions only at the N’th instruction which is executed after the instruction that emit the new value. This N value is the latency.

For each register write operation that has not yet committed, the register name will be displayed followed by the latency value in parenthesis and the write value. The latency value tells you how many instructions should be executed before the register write will be committed.

For example:

S13(5): uint32`4100

The above indicates that a write of value 4100 to SRF register number 13 was issued and the write will be committed within 5 instructions from now. After the register write will be committed, it will no longer be visible in the Bypass, but the register value will be visible in Registers.

When a write operation is made with a predicate to only a partial register value, the parts of the register value that are not touched by the write operation will be visible as “X”. See the following example:

I2(6): uint32'{0, X, X, X, X}

The above indicates that only the first element of the 5D index register number 2 is written to be 0 and all other 4 elements are unchanged.

3.5.2.3. Registers

The Registers section shows the current values of all TPC architecture registers.

When the register value is too long, such as VRF registers, only part of the value is visible. It is possible to hover on top of it and a pop-up window will appear with the entire value of the register.

VRF register values are displayed in groups of 32 bytes each:

uint32’{ GRP-0: 1936311911, 3964338001, 2881146153, 1189346290, 4166304380, 2380785691, 1663982198, 2596904755

GRP-1: 3071095398, 1520654385, 386227493, 3562989912, 3335369387, 290763931, 1098154510, 2279357729 GRP-2: 2705254768, 1744625985, 2541617470, 2523651306, 710686863, 4214993132, 1413263154, 1557862636 GRP-3: 1140726274, 984283899, 3872467451, 2491169797, 1639897205, 3143432585, 2500827560, 3018976177 GRP-4: 16248581, 3845825001, 3502967754, 3177465672, 2820547359, 3311696668, 3311039252, 1668893534 GRP-5: 610562107, 3691617809, 4071412906, 1001328667, 1357106483, 1544898376, 421687227, 4118649786 GRP-6: 471523595, 704127871, 320578808, 3897477813, 3761779352, 876432761, 2001625020, 91016186 GRP-7: 723617452, 2723486378, 1941270718, 3546022971, 2947781686, 2665804002, 1226125903, 2840494845 }

Each register value is prepended with its current display data type. The display data type can be changed by right clicking the register value and selecting the desired data type. When changing the display data type for a specific register, the debugger will remember that setting and will use the same data type for that register. This is also applicable when it shows in the Bypass section.

Some register types will always be displayed using the same data type which cannot be changed. For example, IRF registers will always be displayed as uint32, ADRF registers will always be displayed as hex64, SP registers (Scalar predicates) will always be displayed as hex8.

In order to copy a register value to the clipboard, right click on its value and select “Copy”.

A register value can also be displayed in the Debug Console by specifying its name and optionally display data type. See Debug Console

3.5.3. Tensor Info View

The Tensor Info section displays the configuration of all tensors defined for the kernel. It includes data such as the size of element in the tensor, the tensor dimensions and padding value.

3.5.4. Debug Start/Stop/Step

The set of buttons marked with the yellow arrow 5 in The main view areas provides control for single step, continue and stop kernel execution in the simulator.

The first button from left to right, with the “play” icon, releases the TPC simulator to continue and execute kernel instructions until the next breakpoint is hit. If no breakpoint is hit and the kernel continues to completion, then the debugger will break on the first kernel instruction of the next TPC invocation made by the TPC test program. When the TPC test program finishes with no more TPC invocations, the debug session will disconnect and end. The current number of the TPC invocations can be seen on the status bar at the bottom right of the Visual Studio Code debugger. See the yellow arrow 8 in The main view areas .

The second, third and fourth buttons, with the “Step Over”, “Step Into” and “Step Out” icons, behave in the same way and make the TPC simulator continue kernel execution until the next kernel instruction is hit, if in disassembly debug mode, or until the start of a new source line, if in TPC-C source level debug mode. When “Step Over” is done on a halt instruction, the kernel execution ends and the test program continues execution and will break only on the next TPC kernel invocation made by the program.

The fifth button, with the “Restart” icon, will stop and then restart the debug session. This will work only if the debug session was configured to automatically start the TPC test program when the debug session starts. See `Automatically launch TPC test program on local machine`_. In this case, the debug session and the test program will stop and then a new debug session will start and the configured TPC test program will get re-launched and attached to it.

Pressing the “Restart” button when the debug session is configured to attach to a running TPC test program will relaunch the debug session that will try to connect to the test program. However, since the test program is no longer communicating with the TCP port, the debugger will fail to connect.

The sixth button, with the “Stop” icon, will stop the debug session. If the debug session is configured to start the TPC test program then the TPC test program will terminate. If the debug session is configured to attach to a running program, the debugger will detach from it and the the test program will continue to run.

3.5.5. Instruction Operands View

When a TPC instruction is executed the value of register operands may be loaded from the registers commit bypass values or only from the register file.

The “TPC: LAST INSTRUCTION OPERANDS” Tab, which is located at the bottom, near the debug console (see yellow arrow 6 in The main view areas), shows the actual operand values used by the last executed instruction. The operation, operand fetch mask and operands values are shown for each of the LOAD, SPU, VPU and STORE instruction slots.

For example, see the below figure.

../_images/tpc_debug_inst_op.png

The last executed command was:

nop;  mov_irf_dim  0x3 S10, I0, SP0;  nop;    st_l  0x8, S10, SP0

You can see in the “LAST INSTRUCTION OPERANDS” tab that the operation for the LOAD and VPU slots are NOP. SPU slot executed a MOV operation and the value used for the I0 operand was {0, 0, 0, 0}. The STORE slot executed a store local operation and the value of the S10 operand was 64.

3.5.6. Debug Console

The “DEBUG CONSOLE” Tab, which is located on the bottom part of the Visual Studio Code window (see yellow arrow 7 in The main view areas) allows sending commands or expressions to be evaluated by the debugger. An expression text should be typed in the “>” prompt and when <Enter> is hit it will be evaluated by the debugger and the result will be shown in the console.

Currently, only supported expressions allows to evaluate the value of variables and registers.

3.5.6.1. Evaluate Variable Value

When TPC-C source level debugging is available for the debugged kernel (See TPC-C Source or Disassembly level debugging) , typing source level variable name will return its current value. It will return “Unknown” if the value cannot be determined due to lack of debug information.

3.5.6.2. Evaluate Register Value

When typing the name of a register in the debug console prompt, the register value will be shown. If the register has some modifications in the bypass, then the value in the register file as well as all bypass values for that register will be displayed.

It is possible to append the “tag” character (`) to the register name, followed by the desired data type string to be used to display the register value. For example, typing “S10” will show:

S10
uint32'64
S10(6): uint32'0

That means that the current value of S10 in the register file is 64, and it also has ש modification value in the bypass of zero that will get committed in 6 instructions. The display data type (uint32) is used since this is the last data type setting the user requested for that register - See Registers.

When typing “S10’fp32”, the same values will be displayed but with floating point interpretation:

S10'fp32
fp32'8.96831e-44
S10(6): uint32'0

The following table describes the supported data type strings.

Table 3.1 Supported data type strings

Data type string

Description

int8

8-bit signed integer

uint8

8-bit unsigned integer

bool

8-bit boolean value (1 indicates value != 0)

int16

16-bit signed integer

uint16

16-bit unsigned integer

int32

32-bit signed integer

uint32

32-bit unsigned integer

int64

64-bit signed integer

int4

4-bit signed integer

uint4

4-bit unsigned integer

fp32

32-bit floating point

fp16

16-bit floating point

bf16

16-bit bfloat

fp8_152

8-bit floating point with 5-bits exponent and 2-bits mantissa

fp8_143

8-bit floating point with 4-bits exponent and 3-bits mantissa

hex8

8-bit unsigned integer shown in hexadecimal base

hex16

16-bit unsigned integer shown in hexadecimal base

hex32

32-bit unsigned integer shown in hexadecimal base

hex64

64-bit unsigned integer shown in hexadecimal base

3.5.7. Memory View

The memory view allows you to display selected areas of SLM, VLM, MMIO or TENSORS.

To open the memory view, right click the main source view and select “TPC Memory View”. A new editor tab will open where you can select the desired memory type in the displayed option menu.

Multiple memory views can be opened, each to display a different memory area. The life cycle of each memory view is until the current TPC kernel invocation is ended. After the TPC kernel has finished, the memory view buttons will be grayed out and disabled.

The following sections describe each memory type.

3.5.7.1. SLM

When selecting the SLM memory type, the total size of Scalar Local Memory is displayed. SLM values are always displayed as 32-bit elements. You can select the data type and range of SLM elements to view. Once the “Apply” button is pressed, a table with the SLM memory content will be displayed as shown in the following figure:

../_images/tpc_debug_slm.png

When SLM memory is modified within the displayed range, the values in the table will be automatically updated and will stay up-to-date.

Memory cells can be edited and assigned different values. See Cell Editing in Memory View.

3.5.7.2. VLM

When selecting the VLM memory type, the total size of Vector Local Memory is displayed. Each VLM element is 256-bytes long which is interpreted as an array of a selected data type. After selecting the desired data type and range of VLM elements to display, pressing on the “Apply” button will display the VLM content in a table as shown in the following figure:

../_images/tpc_debug_vlm.png

When VLM memory is modified within the displayed range, the values in the table will be automatically updated and will stay up-to-date.

Memory cells can be edited and assigned different values. See Cell Editing in Memory View.

3.5.7.3. MMIO

When selecting the MMIO memory type, the total size of MMIO configuration space is displayed. MMIO values are displayed always as 32-bit elements, You can select the data type and range of MMIO elements to view. Once the “Apply” button is pressed, a table with the configuration memory content will be displayed as shown in the following figure:

../_images/tpc_debug_mmio.png

When MMIO memory is modified within the displayed range, the values in the table will be automatically updated and will stay up-to-date.

Memory cells can be edited and assigned different values. See Cell Editing in Memory View.

3.5.7.4. TENSOR

The Tensor memory type allows you to view a two dimensional slice of data of up to 5 dimensional tensor.

The view allows you to select the desired tensor number, the display data type and the 5D index range of the 2D slice. The “Start index” is the 5D index of the first location in the tensor, the “End index” is the last index to be shown. The start and stop index value at each dimension should match. Only two dimensions can be set to a range larger than 1. These two dimensions define the 2D slice to be displayed.

Once you select the slice, press the “Apply” button to display the tensor content in a table. Each axis of the 2D slice is marked with a different background color which matches the displayed data table. For example:

../_images/tpc_debug_tensor.png

When TENSOR memory is modified within the displayed range, the values in the table will be automatically updated and will stay up-to-date.

Memory cells can be edited and assigned different values. See Cell Editing in Memory View.

3.5.7.5. Cell Editing in Memory View

Any cell in the memory view table can be clicked and a new value can be typed. Once <Enter> is hit, the memory location will be modified with the new value.

The new value string is interpreted as the selected data type for the memory view. A different data type string can be specified by prepending the desired data type string followed by the “tag” character (‘). For example, the following value is equal the integer value 15:

hex32'f

See Supported data type strings for possible data type strings.