TPC User Guide

The Tensor Processor Core™ (TPC™) is a fully programmable VLIW4 processor designed to execute non-linear deep learning operators, such as spatial pooling/batch normalization. It is embedded in Intel® Gaudi® AI accelerators. Gaudi SoC contains numerous TPC cores all operating in parallel, with each core running a single thread. TPC is designed with Very Long Instruction Word (VLIW) architecture. Its wide Single Instruction Multiple Data (SIMD) vector unit supports 2048-bit SIMD operations with data types such as float, bfloat16, INT16, INT32 and INT8. In each cycle, the TPC’s ALU can execute up to 64 floats/INT32 ops, 128 INT16 ops, or 256 INT8 ops.


Figure 23 Neural Network Hardware Mapping – Use of MME and TPC