Built-in Functions

Program Management Special Functions

The following program management special functions are available:

int5 get_index_space_offset()

Parameter

Description

return value

Returns the index space offset for the current program invocation

int5 get_index_space_size()

Parameter

Description

return value

Returns the index space size for the current program invocation

unsigned int get_dim_size(tensor a, unsigned int dim)

Parameter

Description

a

[in] Tensor handle.

dim

[in] Tensor dimension index to be queried.

return value

Tensor dimension size, in elements.

unsigned int get_dim_stride(tensor a, unsigned int dim)

Parameter

Description

a

[in] Tensor handle.

dim

[in] Tensor dimension index to be queried.

return value

Tensor dimension stride, in elements.

unsigned int get_pad_value_<tensor data type>(tensor a)

Parameter

Description

a

[in] Tensor handle.

return value

Tensor’s pad value.

This function is supported for the following data types:

  • uint

  • int

  • float

  • bf16

  • short

  • ushort

  • char

  • uchar

void set_pad_value_<tensor data type>(tensor a,<tensor data type> val)

Parameter

Description

a

[in] Tensor handle.

val

New pad value to set.

This function supports the following data types:

  • uint

  • int

  • float

  • bf16

  • short

  • ushort

  • char

  • uchar

Built-in Special Functions

Table 2 describes the available built-in special functions.

Set the following flag to use special functions in your TPC-C code:

specialFunctionsUsed = 1 in the glue-code

Table 2: Built-in Special Functions

Function

Single-precision Floating Point – max ULPs

float64 v_reciprocal_f32(float64 x)

2

float64 v_sqrt_f32(float64 x)

2

float64 v_exp_f32(float64 x)

2

float64 v_exp_cephes_f32(float64 x)

1

float64 v_log_f32(float64 x)

3

float64 v_log2_f32(float64 x)

3

float64 v_tanh_f32(float64 x)

3

float64 v_pow_f32(float64 x, float64 y)

20

float64 v_pow2_f32(float64 x)

2

float64 v_rsqrt_f32(float64 x)

3

float64 v_div_f32(float64 x, float64 y)

2

float64 v_sin_f32(float64 x)

2

float64 v_cos_f32(float64 x)

2

float64 v_tan_f32(float64 x)

3

float64 v_sigmoid_f32(float64 input)

16

float64 v_asin_cephes_f32(float64 input)

2

float64 v_acos_cephes_f32(float64 input)

2

float64 v_atan_cephes_f32(float64 input)

3

float64 v_asinh_f32(float64 input)

6

float64 v_acosh_f32(float64 input)

10

float64 v_atanh_f32(float64 input)

3

float64 v_sinh_cephes_f32(float64 input)

3

float64 v_cosh_cephes_f32(float64 input)

3

float64 v_mod_f32(float64 input)

70

float64 v_expm1_f32(float64 input)

10

INT8/INT16 Built-in Special Functions

The following INT8/INT16 built-in special functions are available:

  • int8 tanh(int8 a);

  • int16 tanh(int16 a);

  • int8 sigmoid(int8 a);

  • int16 sigmoid(int16 a);

  • int8 exp(int8 a); // for X < 0

  • int16 exp (int16 a); // for X < 0

  • 1/x for x in [0.5 , 1)

Intrinsics

Every TPC instruction is wrapped with an intrinsic for every supported data type and scalar/vector argument combination.

The intrinsic function name is usually derived from the instruction name, instruction data type, return data type width, scalar/vector properties of its arguments and predicate values.

The intrinsic naming convention adheres to the following pattern:

<return type width>_<instruction datatype>_<instruction name>_<arg1
width>_<arg2 width>_<b|bv>( arguments… );
  • The return type width can be:

<return type width>

Description

V

Vector type

AV

Augmented vector (4096-bit or 8192-bit vectors)

S

Scalar type

B

Boolean data type

BV

Boolean vector data type

  • The instruction type can be:

<instruction datatype>

Description

F32

Single-precision floating point

I32

32-bit signed integer

U32

32-bit unsigned integer

BF16

Brain floating point

I16

16-bit signed integer

U16

16-bit unsigned integer

I8

8-bit signed integer

U8

8-bit unsigned integer

I

INT5 data type

  • The argument width can be:

< arg width>

Descrtipiton

S

Scalar data type

V

Vector data type

  • Predicate arguments can be:

Predicate Argument

Description

B

Scalar Boolean

BV

Vector Boolean

Intrinsic usage example:

bool256 bv_u16_cmp_leq_v_v_b(ushort128 a,ushort128 b, bool
predicate,bool predicatePolarity);

bool256 bv_f32_cmp_leq_v_s_vb(float64 a, float b, bool256 predicate,
bool predicatePolarity);

float64 v_f32_mul_v_v_b(float64 a, float64 b, bool predicate, bool
predicatePolarity);

Built-in Vector Reduction Intrinsics

Vector reduction intrinsics provide an easy way to compute the summation, product, minimum, maximum, argmin and argmax of a vector. The vector values are reduced to a single value, and then it is broadcasted to all lanes of the result vector.

Table 3 describes the available built-in reduction intrinsics for different datatypes.

Table 3: Built-in Reduction Intrinsics

Reduction Intrinsics

Description

float64 v_f32_reduce_add(float64 x)

Summation of all elements of the F32 vector

float64 v_f32_reduce_mul(float64 x)

Product of all elements of the F32 vector

float64 v_f32_reduce_min(float64 x)

Minimum value of all elements of the F32 vector

float64 v_f32_reduce_max(float64 x)

Maximum value of all elements of the F32 vector

uint64_float64_pair_t v_f32_reduce_argmin(float64 x)

Index of the minmum value of all elements of the F32 vector

uint64_float64_pair_t v_f32_reduce_argmax(float64 x)

Index of the maximum value of all elements of the F32 vector

int64 v_i32_reduce_add(int64 x)

Summation of all elements of the I32 vector

int64 v_i32_reduce_max(int64 x)

Maximum value of all elements of the I32 vector

uint64_int64_pair_t v_i32_reduce_argmin(int64 x)

Index of the minmum value of all elements of the I32 vector

uint64_int64_pair_t v_i32_reduce_argmax(int64 x)

Index of the maximum value of all elements of the I32 vector

bfloat128 v_bf16_reduce_add(bfloat128 x)

Summation of all elements of the BF16 vector

bfloat128 v_bf16_reduce_min(bfloat128 x)

Minimum value of all elements of the BF16 vector

bfloat128 v_fb16_reduce_max(bfloat128 x)

Maximum value of all elements of the BF16 vector

short128 v_i16_reduce_min(short128 x)

Minimum value of all elements of the I16 vector

short128 v_i16_reduce_max(short128 x)

Maximum value of all elements of the I16 vector

char256 v_i8_reduce_min(char256 x)

Minimum value of all elements of the I8 vector

char256 v_i8_reduce_max(char256 x)

Maximum value of all elements of the I8 vector

uchar256 v_u8_reduce_min(uchar256 x)

Minimum value of all elements of the U8 vector

uchar256 v_u8_reduce_max(uchar256 x)

Maximum value of all elements of the U8 vector

Exceptions to C99 standard

Initialization of Bool256 Variable

The compiler regards Bool256 as an array of chars with length of 32. Use the following syntax to initialize all bits of the array to one:

bool256 a = {0xff} ;

Initialization of Local Memory

According to C99 : “If an object that has static or thread storage duration is not initialized explicitly and if it has arithmetic type, it is initialized to (positive or unsigned) zero;”*

For performance considerations, local memory is left un-utilized in the beginning of a program, although having static storage duration.