TPC Tools Installation Guide
On this Page
TPC Tools Installation Guide¶
Intel® Gaudi® tools is a .deb/.rpm package (habanatools
) containing all the necessary tools
required to build Tensor Processor Core™ (TPC™) kernel plugins for the graph compiler. The
package includes a TPC-C compiler, assembler, dis-assembler and all
necessary headers. The library can be installed on any development
machine and does not require the availability of Gaudi hardware.
For more details on the TPC SDK, refer to the following:
“Gaudi TPC Performance Library Reference”
Package Prerequisites¶
The package depends on the following libraries:
gcc
gcc-c++
cmake3 > 3.5
boost-devel
Note
Review the Intel Gaudi TPC Tools Debugger document for details about the IDE plugin for TPC code development and debugging.
Installation¶
RPM Installation for CentOS 7.5 (Desktop Configuration)¶
Launching the Eclipse GUI IDE requires a CentOS 7.5 Desktop configuration.
To install RPM, run the following on the CentOS bash terminal:
sudo yum install epel-release
sudo yum install --enablerepo=epel cmake3
sudo ln -s /usr/bin/cmake3 /usr/bin/cmake
sudo yum install ./habanatools-<version>.x86_64.rpm
Deb Installation for Ubuntu 22.04 (Desktop Configuration)¶
To install Deb, run the following on Ubuntu bash terminal:
sudo dpkg -i ./habanatools_<version>_amd64.deb
habanatools
Content¶
Once the package is installed, the following files are added to your machine:
Table 1: Content of Package
Location |
Purpose |
|
---|---|---|
1 |
/usr/bin/tpc-clang |
TPC-C compiler and assembler |
2 |
/usr/bin/tpc-llvm-objdump |
TPC dis-assembler |
3 |
/usr/lib/habanatools/libtpcsim_shared.so |
TPC simulator |
4 |
/usr/lib/habanatools/libtpc_test_core.so |
Test core library |
5 |
/usr/lib/habanatools/include/TPC.h |
Simulator headers |
6 |
/usr/lib/habanatools/include/gc_interface.h |
Glue code interface header |
7 |
/usr/lib/habanatools/include/tpc-intrinsics.h |
Available TPC-C intrinsics |
8 |
/usr/lib/habanatools/include/tpc_test_core_api.h |
Test core API |
TPC-C Compiler Command Line Arguments¶
The TPC compiler is LLVM based and accepts standard LLVM command line arguments along with the following additions:
Optimization levels: -O2 and -O0 are max and min supported levels. -O2 is the default level. -O1 turns off HW loops and a few other optimizations. -O0 turns off instruction scheduling and bundling, and pads all instructions with 6 NOPs to ensure the results queue is committed to the register file before the next instruction is executed.
-march=<name> - Architecture switch. Currently, the supported name is “dali” (AKA Goya).
-max-tensors <n> - Tensor limit.
n
is a number in range 0..8. Default is 8.-vlm <n> - Vector local memory limit.
n
is the size of the vector’s local memory in KB. Default is 80.-main-function <main_entry_name> - Name of entry function. Default is “main”.
-all-loops-taken - Enables global elimination of loop end padding as loops are always taken. This can improve performance when the developer commits, all loops in the program are taken at least once.
-reg-mem-count - Prints to console usage of registers and local memory at compile time.
-disable-lut-warn - Suppress performance warning when used LUT size exceeds LUT cache.
-x c++ - Enables static c++ in TPC-C.
-o <file name> - Sets name of output file. (Standard LLVM argument).
Compiler Usage Example¶
The compiler supports a single translation unit, hence -c
argument should be defined.
/usr/bin/tpc-clang reduction.c -c -x c++ -o reduction.o
The output of the compilation session is an ELF file named reduction.o
.
To extract raw binary from the ELF file, run the following command:
objcopy -O binary --only-section=.text reduction.o reduction.bin
Assembler Command Line Arguments¶
The assembler and compiler are merged into a single binary. The compiler
uses the file suffix in order to decide if applying C-language
front-end or TPC assembler front-end, .tpcasm
invokes the
assembler.
/usr/bin/tpc-clang <input text file name>.tpcasm
-c -o <output object file name>.o
Dis-assembler Command Line Argument¶
The dis-assembly is printed to the standard console:
/usr/bin/llvm-objdump --triple tpc -d -j .text -no-show-raw-insn
-no-leading-addr -mcpu=<gaudi> <input object file to dis-assemble>.o