PyTorch CustomOp API Legacy
On this Page
PyTorch CustomOp API Legacy¶
This document describes API exposed to write custom PyTorch operators for the Intel® Gaudi® AI accelerator. The API provides an ability to implement a custom HPU kernel for new PyTorch operators. This allows PyTorch to execute a given operator on a Gaudi.
Note
The API described in this document allows to create CustomOp in Lazy mode only. For Eager and torch.compile
modes,
refer to CustomOp API.
Prerequisites¶
TPC is a fully programmable core designed for workloads that do not map to matrix multiplication operations. TPC kernel refers to a concrete implementation that performs a desired operation. Before working with PyTorch CustomOp API, the user must prepare the TPC kernel.
This document does not describe how to implement custom TPC kernels. For information on how to write TPC kernels, refer to the following:
API Overview¶
The main part of the public interface resides in hpu_custom_op.h
header file. They contain all the necessary declarations to define a custom Intel Gaudi PyTorch kernel.
The following lists the most important classes and structs to interact with:
HabanaCustomOpDescriptor - Descriptor with all the needed information for the custom kernel.
NodeDesc - PyTorch CustomOp info description.
InputDesc - PyTorch CustomOp inputs info.
OutputDesc - PyTorch CustomOp outputs info.
Basic Workflow for Writing CustomOp¶
In order to define custom HabanaCustomOpDescriptor, call REGISTER_CUSTOM_OP_ATTRIBUTES macro:
Define input vector InputDesc for all inputs of kernel.
Define output vector OutputDesc for all outputs of kernel.
Call Macro with schema name, tpc guid, inputs, outputs and user param callback function.
Create the main execution function for CustomOp:
Access HabanaCustomOpDescriptor registered in previous step using getCustomOpDescriptor.
Call execute with vector of IValue inputs.
Define PyTorch schema for CustomOp using TORCH_LIBRARY and TORCH_LIBRARY_IMPL.
Define op schema using TORCH_LIBRARY.
Define PyTorch dispatcher function with the function from the previous section using TORCH_LIBRARY_IMPL.
API Limitations¶
Single TPC Kernel Definition per HabanaCustomOpDescriptor¶
HabanaCustomOpDescriptor can define only a single TPC kernel within its implementation.
If a given complex operation requires more than one TPC kernel to represent it, there are two options:
You can implement a new TPC kernel combining the functionality of simple TPC kernels.
Or, if possible, represent complex operation as a series of simple operation at Python-level.
Memory Layout¶
Currently, memory layout is taken from the input 0 Tensor memory layout.
Output Shape¶
If the output shape callback function is not set, the output shape will be the same as input 0 Tensor shape.
Inputs Types to CustomOp¶
Currently, only Tensor and Scalar are supported as input types to CustomOp. Meaning, no arrays of any type are supported.
CustomOp Loading¶
Once the CustomOp is built, it needs to be loaded in the topology in Python. PyTorch has a util function to load the library:
import torch # it is important to load the module before loading custom op libs torch.ops.load_library(custom_op_lib_path) # output = torch.ops.<custom_op_schema>(<inputs>) a_topk_hpu, a_topk_indices_hpu = torch.ops.custom_op.custom_topk(a_hpu, 3, 1, False)
API Usage Examples¶
An example of how to use the API can be found in PyTorch Model References GitHub page.