Implementing and Integrating New lib
On this Page
Implementing and Integrating New lib¶
Coding¶
Implement the following components in order to add a new lib that contains your implemented kernel:
Kernels
Glue Code
Tests (optional)
For the complete code examples, please visit - Habana Custom Kernel.
Kernels¶
The kernel is written in TPC-C language as described in TPC Programming Language. The kernel is a main function and its signature contains a list of parameters. Tensors and scalars are parameters with some restrictions.
The following is an example of a simple kernel:
void main(tensor inputA, tensor outputB)
{
const int dim0 = 0;
const int dim1 = 1;
const int dim2 = 2;
const int dim3 = 3;
// Index space coordinates
const int5 idx_s = get_index_space_offset();
const int5 idx_e = get_index_space_size() + idx_s;
int5 ifmCoords = {0, 0, 0, 0, 0};
float64 in,out;
for (int idx0 = idx_s[dim0]*64; idx0 < idx_e[dim0] * 64; idx0 += 64)
{
ifmCoords[dim0] = idx0;
for (int idx3 = idx_s[dim3]; idx3 < idx_e[dim3]; idx3 += 1)
{
ifmCoords[dim3] = idx3;
for (int idx2 = idx_s[dim2]; idx2 < idx_e[dim2]; idx2 += 1)
{
ifmCoords[dim2] = idx2;
for (int idx1 = idx_s[dim1]; idx1 < idx_e[dim1]; idx1 += 1)
{
ifmCoords[dim1] = idx1;
in = v_f32_ld_tnsr_i(ifmCoords, inputA);
out = v_f32_abs_v(in);
f32_st_tnsr_i_v(ifmCoords, outputB, out);
}
}
}
}
}
Glue Code¶
The program and its associated definition set is passed to the Graph Compiler to be incorporated into the DNN topology through a host side interface called Glue Code. The outer component (GraphCompiler) interacts with the new lib through two connectivity points:
GetKernelNames
HabanaKernel
An example of these two methods is found under entry_points.cpp
.
GetKernelNames¶
The method returns the exported kernel names list. The kernel name must not exceed 64 bytes length.
names
: [out] List of strings to be filled with kernel names.kernelCount
: [in/out].[in] The maximum number of strings in ‘names’ argument.
[out] If the number of kernels <= maximum list length, copy the kernel names into the list(names) and update the number of kernels, otherwise just update the required list length.
DeviceId
- [in] The type of device (an enum undergc_interface.h
). Possible values:gcapi::DEVICE_ID_GOYA
gcapi::DEVICE_ID_GAUDI
HabanaKernel¶
typedef struct _HabanaKernelParams_t
{
_IN_ int apiVersion;
_IN_ char nodeName[MAX_NODE_NAME];
_IN_ UserParams_t NodeParams; /* user specific. */
_IN_ DeviceId_t deviceId; /* asic ID */
_IN_ KernelType_t kernelType; /* deprecated */
_INOUT_ Tensor_t inputTensors[MAX_TENSOR_NR]; /* array of the input tensor handles.*/
_INOUT_ unsigned inputTensorNr; /* the number of input tensors */
_INOUT_ Tensor_t outputTensors[MAX_TENSOR_NR]; /* array of the output tensor handles. */
_INOUT_ unsigned outputTensorNr; /* the number of output tensors. */
_IN_ unsigned debugFlags; /* for internal use.- used to debug/profile
* programs. */
_IN_ unsigned NodeParamsSize; /* Size of struct pointed by NodeParams */
_IN_ unsigned maxAvailableTpc; /* Kernel writer should know that it will get any number between 1 and maxAvailableTpc
* Kernels that rely on number of TPC in the index space should expose index space size with maxAvailableTpc
* Examples: Sparse segment sum, Embedding bag kernels, etc .. */
unsigned reserved[28];
} HabanaKernelParams_t;
The method is the new kernels lib main entry point.
params
:[in] The kernel properties:Requested kernel name and data type (e.g. maxpool_2d_i8 / averagepool_2d_f32 etc).
Number of input/output tensor for the kernels.
For each input/output tensor the Graph Compiler supplies:
Data type
Size in each dimension
Quantization parameters (scale /zero point)
typedef struct _HabanaKernelInstantiation_t
{
_OUT_ TensorGeometry_t indexSpaceGeometry;
_OUT_ TensorAccessPattern_t inputTensorAccessPattern[MAX_TENSOR_NR];
_OUT_ PadValue inputPadValues[MAX_TENSOR_NR];
_OUT_ TensorAccessPattern_t outputTensorAccessPattern[MAX_TENSOR_NR];
_INOUT_ AuxTensor_t auxiliaryTensors[MAX_TENSOR_NR];
_OUT_ unsigned auxiliaryTensorCount;
_INOUT_ DeviceKernel_t kernel;
_OUT_ ProgramFlags flags;
_INOUT_ void* kernelElf;
_INOUT_ unsigned elfSize;
_OUT_ PadValue outputMemsetValues[MAX_TENSOR_NR];
_OUT_ unsigned auxNotRequiringInit; /* This is a bit mask is defining which aux
* tensor should be regarded as SRAM
* scratch pad aux tensor*/
unsigned reserved[16];
} HabanaKernelInstantiation_t;
instance
:[out] Returned kernel final properties.Program binary.
Size of index space as described in Index Space.
Index space mapping as described in Index Space Mapping.
Values of scalar parameter given to TPC-C ‘main’ function (up to 32 dwords).
Optionally, decide the pad value of the input tensors.
Glue code should perform the following:
Verify input/output tensors properties are correct (fits the kernel definition):
Input/output tensors count matches the kernel definition.
Input/output tensors dimensions matches the kernel definition.
Input/output tensors data type matches the kernel definition.
Return program binary.
Return size of index space as described in Index Space.
Return index space mapping as described in Index Space Mapping.
Return values of scalar parameter given to TPC-C ‘main’ function (up to 32 dwords).
Optionally, decide the pad value of the input tensors.
Build lib Project¶
Building the lib project requires the habanatools
package. See installation instructions provided in the TPC Tools Installation Guide.
Upon successful compilation, the new lib is generated:
<build path>/builds/<Debug or Release>/src/lib<name>_kernels.so
– the plugin shared object to be loaded by SynapseAI in production.
Print¶
‘printf’ is a built-in utility function exposed by the TPC compiler to the TPC kernel writers. It enables entry level debugging capabilities in the TPC processor. Establishing an ABI between the compiler and Habana runtime implements printf.
Syntax¶
Printf syntax is identical to C runtime library syntax with the following restriction:
Printf accepts, at most, only one variable in addition to the message string.
To enable printf support, define the following pragma:
Scalar printing - Similar to C library function- printf(“depth=%d “,depth);
Vector print - You can use a loop to print the whole vector or just part of it. For example, vector of floats (64 elements in a vector) for (int i=0; i<64; i++) { printf(“%f, “, vec[i]);}
The code below demonstrates the printing format:
#pragma tpc_printf(enable)
void printTest(void)
{
char char_val = 0xff;
unsigned char uchar_val = 0xff;
short short_val = 0xb221;
unsigned short ushort_val = 0xb221; //45,601
int int_val = 0x8455CDD1;
unsigned int uint_val = 0x8455CDD1; //2,220,215,761
bf16 bf16_val = 46.25;
float float_val = 15.23423;
/*V_LANE_ID_32 vector, values 0-63 */
uint64 vec_lane_id = V_LANE_ID_32;
printf("Test string!\n");
printf("char value is %hhd\n", char_val);
printf("unsigend char value is %hhu\n", uchar_val);
printf("short value is %hd\n", short_val);
printf("unsigend short value is %hu\n", ushort_val);
printf("int value is %d\n", int_val);
printf("unsigend int value is %u\n", uint_val);
printf("bfloat value is %bf\n", bf16_val);
//printf("half float value is %hf\n", f16_val);
printf("float value is %f\n", float_val);
printf("Vector Print:\n");
printf("=============\n");
for (int i = 0; i < 64; i++)
{
printf("%u, ", vec_lane_id[i]);
}
}
Example output:
31 32 33 34 35 36 37 38 39 40 41 42 |