TPC Getting Started Guide

This Getting Started guide is designed to give you the basic skills and information to get you quickly up to speed on writing Tensor Processor Core™ (TPC™) kernel, integrating it with Intel® Gaudi® software and then integrating it into a PyTorch model.

For more details on TPC installation, code development, debugger installation and Intrinsics, refer to the following:

  • TPC Tools Installation Guide - Provides installation instructions for the TPC-C compiler, assembler, dis-assembler and all necessary headers.

  • TPC User Guide - Getting started guide for TPC code development.

  • TPC Tools Debugger - Provides TPC debugger installation and usage instructions.

  • TPC Intrinsics Guide - Provides TPC Intrinsics introduction, reference to the header in GitHub and intrinsics APIs.

Build a TPC Kernel for Shared Object (.so)

A TPC program consists of two parts – TPC execution code and host glue code. TPC code is the ISA executed by the TPC processor. Host code is executed on the host machine and provides specifications regarding how the program input/outputs can be dynamically partitioned between the numerous TPC processors in Gaudi.

To build a TPC kernel, follow the steps below:

Integrate a TPC Kernel into Intel Gaudi Graph Compiler

To use your TPC custom kernel library with Intel Gaudi Software Stack, integrate the library into the graph compiler (GC). Follow the steps below:

  • Add your custom kernel library path to the environment variable GC_KERNEL_PATH. When initiating a TPC node, the GC will browse all the libraries that are specified under GC_KERNEL_PATH.

  • Export export GC_KERNEL_PATH=/path/to/your_so/libcustom_tpc_perf_lib.so:/usr/lib/habanalabs/libtpc_kernels.so. Once exported, you can add new nodes from your cutom kernel lib to the graph.

Integrate a TPC Kernel into a PyTorch Model

By integrating a TPC kernel into a PyTorch model, you can add your custom TPC Op to a PyTorch model. To integrate a TPC Kernel into a PyTorch model, refer to the Basic Workflow in PyTorch CustomOp API.

For examples on integrating a kernel into a PyTorch model, refer to PyTorch CustomOp Examples.