TPC Tools Installation Guide

Intel® Gaudi® tools is a .deb/.rpm package (habanatools) containing all the necessary tools required to build Tensor Processor Core™ (TPC™) kernel plugins for the graph compiler. The package includes a TPC-C compiler, assembler, dis-assembler and all necessary headers. The library can be installed on any development machine and does not require the availability of Gaudi hardware.

For more details on the TPC SDK, refer to the following:

Package Prerequisites

The package depends on the following libraries:

  • gcc

  • gcc-c++

  • cmake3 > 3.5

  • boost-devel


Review the Intel Gaudi TPC Tools Debugger document for details about the IDE plugin for TPC code development and debugging.


RPM Installation for CentOS 7.5 (Desktop Configuration)

Launching the Eclipse GUI IDE requires a CentOS 7.5 Desktop configuration.

To install RPM, run the following on the CentOS bash terminal:

sudo yum install epel-release
sudo yum install --enablerepo=epel cmake3
sudo ln -s /usr/bin/cmake3 /usr/bin/cmake
sudo yum install ./habanatools-<version>.x86_64.rpm

Deb Installation for Ubuntu 22.04 (Desktop Configuration)

To install Deb, run the following on Ubuntu bash terminal:

sudo dpkg -i ./habanatools_<version>_amd64.deb

habanatools Content

Once the package is installed, the following files are added to your machine:

Table 1: Content of Package





TPC-C compiler and assembler



TPC dis-assembler



TPC simulator



Test core library



Simulator headers



Glue code interface header



Available TPC-C intrinsics



Test core API

TPC-C Compiler Command Line Arguments

The TPC compiler is LLVM based and accepts standard LLVM command line arguments along with the following additions:

  • Optimization levels: -O2 and -O0 are max and min supported levels. -O2 is the default level. -O1 turns off HW loops and a few other optimizations. -O0 turns off instruction scheduling and bundling, and pads all instructions with 6 NOPs to ensure the results queue is committed to the register file before the next instruction is executed.

  • -march=<name> - Architecture switch. Currently, the supported name is “dali” (AKA Goya).

  • -max-tensors <n> - Tensor limit. n is a number in range 0..8. Default is 8.

  • -vlm <n> - Vector local memory limit. n is the size of the vector’s local memory in KB. Default is 80.

  • -main-function <main_entry_name> - Name of entry function. Default is “main”.

  • -all-loops-taken - Enables global elimination of loop end padding as loops are always taken. This can improve performance when the developer commits, all loops in the program are taken at least once.

  • -reg-mem-count - Prints to console usage of registers and local memory at compile time.

  • -disable-lut-warn - Suppress performance warning when used LUT size exceeds LUT cache.

  • -x c++ - Enables static c++ in TPC-C.

  • -o <file name> - Sets name of output file. (Standard LLVM argument).

Compiler Usage Example

The compiler supports a single translation unit, hence -c argument should be defined.

/usr/bin/tpc-clang reduction.c -c -x c++ -o reduction.o

The output of the compilation session is an ELF file named reduction.o. To extract raw binary from the ELF file, run the following command:

objcopy -O binary --only-section=.text reduction.o reduction.bin

Assembler Command Line Arguments

The assembler and compiler are merged into a single binary. The compiler uses the file suffix in order to decide if applying C-language front-end or TPC assembler front-end, .tpcasm invokes the assembler.

/usr/bin/tpc-clang <input text file name>.tpcasm

    -c -o <output object file name>.o

Dis-assembler Command Line Argument

The dis-assembly is printed to the standard console:

/usr/bin/llvm-objdump --triple tpc -d -j .text -no-show-raw-insn
-no-leading-addr -mcpu=<gaudi> <input object file to dis-assemble>.o