TPC User Guide¶

The Tensor Processor Core™ (TPC™) is a fully programmable VLIW4 processor designed to execute non-linear deep learning operators, such as spatial pooling/batch normalization. It is embedded in Intel® Gaudi® AI accelerators. Gaudi SoC contains numerous TPC cores all operating in parallel, with each core running a single thread. TPC is designed with Very Long Instruction Word (VLIW) architecture. Its wide Single Instruction Multiple Data (SIMD) vector unit supports 2048-bit SIMD operations with data types such as float, bfloat16, INT16, INT32 and INT8. In each cycle, the TPC’s ALU can execute up to 64 floats/INT32 ops, 128 INT16 ops, or 256 INT8 ops.

../../_images/Neural_Network_Hardware_Mapping.jpg

Figure 25 Neural Network Hardware Mapping – Use of MME and TPC¶

TPC Programming Language
Processor Architectural Overview
TPC Programming Model
TPC-C Language
Built-in Functions
Implementing and Integrating New lib
TPC Coherency
Multiple Kernel Libraries
Abbreviations

Gaudi Documentation 1.21.1 documentation

TPC User Guide

TPC User Guide¶