Built-in Functions¶
Program Management Special Functions¶
The following program management special functions are available:
int5 get_index_space_offset()¶
Parameter |
Description |
---|---|
return value |
Returns the index space offset for the current program invocation |
int5 get_index_space_size()¶
Parameter |
Description |
---|---|
return value |
Returns the index space size for the current program invocation |
unsigned int get_dim_size(tensor a, unsigned int dim)¶
Parameter |
Description |
---|---|
a |
[in] Tensor handle. |
dim |
[in] Tensor dimension index to be queried. |
return value |
Tensor dimension size, in elements. |
unsigned int get_dim_stride(tensor a, unsigned int dim)¶
Parameter |
Description |
---|---|
a |
[in] Tensor handle. |
dim |
[in] Tensor dimension index to be queried. |
return value |
Tensor dimension stride, in elements. |
unsigned int get_pad_value_<tensor data type>(tensor a)¶
Parameter |
Description |
---|---|
a |
[in] Tensor handle. |
return value |
Tensor’s pad value. |
This function is supported for the following data types:
uint
int
float
bf16
short
ushort
char
uchar
void set_pad_value_<tensor data type>(tensor a,<tensor data type> val)¶
Parameter |
Description |
---|---|
a |
[in] Tensor handle. |
val |
New pad value to set. |
This function supports the following data types:
uint
int
float
bf16
short
ushort
char
uchar
Built-in Special Functions¶
Table 2 describes the available built-in special functions.
Set the following flag to use special functions in your TPC-C code:
specialFunctionsUsed = 1 in the glue-code
Table 2: Built-in Special Functions
Function |
Single-precision Floating Point – max ULPs |
---|---|
float64 v_reciprocal_f32(float64 x) |
2 |
float64 v_sqrt_f32(float64 x) |
2 |
float64 v_exp_f32(float64 x) |
2 |
float64 v_exp_cephes_f32(float64 x) |
1 |
float64 v_log_f32(float64 x) |
3 |
float64 v_log2_f32(float64 x) |
3 |
float64 v_tanh_f32(float64 x) |
3 |
float64 v_pow_f32(float64 x, float64 y) |
20 |
float64 v_pow2_f32(float64 x) |
2 |
float64 v_rsqrt_f32(float64 x) |
3 |
float64 v_div_f32(float64 x, float64 y) |
2 |
float64 v_sin_f32(float64 x) |
2 |
float64 v_cos_f32(float64 x) |
2 |
float64 v_tan_f32(float64 x) |
3 |
float64 v_sigmoid_f32(float64 input) |
16 |
float64 v_asin_cephes_f32(float64 input) |
2 |
float64 v_acos_cephes_f32(float64 input) |
2 |
float64 v_atan_cephes_f32(float64 input) |
3 |
float64 v_asinh_f32(float64 input) |
6 |
float64 v_acosh_f32(float64 input) |
10 |
float64 v_atanh_f32(float64 input) |
3 |
float64 v_sinh_cephes_f32(float64 input) |
3 |
float64 v_cosh_cephes_f32(float64 input) |
3 |
float64 v_mod_f32(float64 input) |
70 |
float64 v_expm1_f32(float64 input) |
10 |
INT8/INT16 Built-in Special Functions¶
The following INT8/INT16 built-in special functions are available:
int8 tanh(int8 a);
int16 tanh(int16 a);
int8 sigmoid(int8 a);
int16 sigmoid(int16 a);
int8 exp(int8 a); // for X < 0
int16 exp (int16 a); // for X < 0
1/x for x in [0.5 , 1)
Intrinsics¶
Every TPC instruction is wrapped with an intrinsic for every supported data type and scalar/vector argument combination.
The intrinsic function name is usually derived from the instruction name, instruction data type, return data type width, scalar/vector properties of its arguments and predicate values.
The intrinsic naming convention adheres to the following pattern:
<return type width>_<instruction datatype>_<instruction name>_<arg1
width>_<arg2 width>_<b|bv>( arguments… );
The return type width can be:
<return type width> |
Description |
---|---|
V |
Vector type |
AV |
Augmented vector (4096-bit or 8192-bit vectors) |
S |
Scalar type |
B |
Boolean data type |
BV |
Boolean vector data type |
The instruction type can be:
<instruction datatype> |
Description |
---|---|
F32 |
Single-precision floating point |
I32 |
32-bit signed integer |
U32 |
32-bit unsigned integer |
BF16 |
Brain floating point |
I16 |
16-bit signed integer |
U16 |
16-bit unsigned integer |
I8 |
8-bit signed integer |
U8 |
8-bit unsigned integer |
I |
INT5 data type |
The argument width can be:
< arg width> |
Descrtipiton |
---|---|
S |
Scalar data type |
V |
Vector data type |
Predicate arguments can be:
Predicate Argument |
Description |
---|---|
B |
Scalar Boolean |
BV |
Vector Boolean |
Intrinsic usage example:
bool256 bv_u16_cmp_leq_v_v_b(ushort128 a,ushort128 b, bool
predicate,bool predicatePolarity);
bool256 bv_f32_cmp_leq_v_s_vb(float64 a, float b, bool256 predicate,
bool predicatePolarity);
float64 v_f32_mul_v_v_b(float64 a, float64 b, bool predicate, bool
predicatePolarity);
Built-in Vector Reduction Intrinsics¶
Vector reduction intrinsics provide an easy way to compute the summation, product, minimum, maximum, argmin and argmax of a vector. The vector values are reduced to a single value, and then it is broadcasted to all lanes of the result vector.
Table 3 describes the available built-in reduction intrinsics for different datatypes.
Table 3: Built-in Reduction Intrinsics
Reduction Intrinsics |
Description |
---|---|
float64 v_f32_reduce_add(float64 x) |
Summation of all elements of the F32 vector |
float64 v_f32_reduce_mul(float64 x) |
Product of all elements of the F32 vector |
float64 v_f32_reduce_min(float64 x) |
Minimum value of all elements of the F32 vector |
float64 v_f32_reduce_max(float64 x) |
Maximum value of all elements of the F32 vector |
uint64_float64_pair_t v_f32_reduce_argmin(float64 x) |
Index of the minmum value of all elements of the F32 vector |
uint64_float64_pair_t v_f32_reduce_argmax(float64 x) |
Index of the maximum value of all elements of the F32 vector |
int64 v_i32_reduce_add(int64 x) |
Summation of all elements of the I32 vector |
int64 v_i32_reduce_max(int64 x) |
Maximum value of all elements of the I32 vector |
uint64_int64_pair_t v_i32_reduce_argmin(int64 x) |
Index of the minmum value of all elements of the I32 vector |
uint64_int64_pair_t v_i32_reduce_argmax(int64 x) |
Index of the maximum value of all elements of the I32 vector |
bfloat128 v_bf16_reduce_add(bfloat128 x) |
Summation of all elements of the BF16 vector |
bfloat128 v_bf16_reduce_min(bfloat128 x) |
Minimum value of all elements of the BF16 vector |
bfloat128 v_fb16_reduce_max(bfloat128 x) |
Maximum value of all elements of the BF16 vector |
short128 v_i16_reduce_min(short128 x) |
Minimum value of all elements of the I16 vector |
short128 v_i16_reduce_max(short128 x) |
Maximum value of all elements of the I16 vector |
char256 v_i8_reduce_min(char256 x) |
Minimum value of all elements of the I8 vector |
char256 v_i8_reduce_max(char256 x) |
Maximum value of all elements of the I8 vector |
uchar256 v_u8_reduce_min(uchar256 x) |
Minimum value of all elements of the U8 vector |
uchar256 v_u8_reduce_max(uchar256 x) |
Maximum value of all elements of the U8 vector |
Exceptions to C99 standard¶
Initialization of Bool256 Variable¶
The compiler regards Bool256 as an array of chars with length of 32. Use the following syntax to initialize all bits of the array to one:
bool256 a = {0xff} ;
Initialization of Local Memory¶
According to C99 : “If an object that has static or thread storage duration is not initialized explicitly and if it has arithmetic type, it is initialized to (positive or unsigned) zero;”*
For performance considerations, local memory is left un-utilized in the beginning of a program, although having static storage duration.