Preprocessor Directives and Macros

The preprocessing directives defined by the C99 specification are supported. Other supported directives are described below.

Pragmas

  • #pragma unroll(UnrollCount) - Enables standard llvm unrolling optimization for the loop.

  • #pragma loop_unroll(UnrollCount) - Enables TPC backend unrolling optimization. By default, if ‘unaligned_trip_count’ option is not specified, then the number of iterations of the loop should be a multiple of UnrollCount. The ‘unaligned_trip_count’ option forces compiler to generate epilogue loop. Therefore, ‘unaligned_trip_count’ option should be omitted if there is a guarantee that iterations count is a multiple of UnrollCount for faster and smaller code. In addition, to pipeline, the loop option ‘pipelined’ should be added. In this case the number of iteration of the loop should also be at least two times larger than UnrollCount. The option ‘taken’ tells the compiler that the loop always executes at least 1 time.

  • To pipeline the loop option, add ‘pipelined’. In this case the number of iteration of the loop should also be at least two times larger than UnrollCount. Option ‘taken’ tells compiler that the loop always executes at least 1 time.

  • #pragma loop_taken - tells compiler that the loop always executes at least 1 time.

Examples:

  • #pragma unroll(2) - unroll two iterations using high-level llvm algorithm.

  • #pragma loop_unroll(4) - unroll 4 iterations using tpc backend algorithm.

  • #pragma loop_unroll(3) pipelined taken - pipeline 3 iterations, assume loop is always entered and has at least 6 iterations.

  • #pragma loop_unroll(4) unaligned_trip_count - unroll 4 iterations using TPC backend algorithm. No assumption on iterations count. Extra epilogue loop is generated.

Macros

TPC Macros are predefined and can be accessed from any TPC-C source code:

  • __TPC__ - common architecture marker

  • __gaudi__ - architecture generation 2

  • __greco__ - architecture generation 3

  • MAX_TENSORS=<n> - reflects actual tensor limit

  • MAX_SLM=<n> - reflects architecture scalar memory limit (in bytes)

  • MAX_VLM=<n> - reflects architecture vector memory limit (in bytes)

  • __HABANA_TOOL_VERSION - Public software version (e.g. recent was 0.10.0)

  • __TPC_DROP_VERSION - Internal drop version (e.g. recent was 18.0.1)

  • VERSION2DEC(a,b,c) - Macro to express an expected version (as “a.b.c” triple)

  • MAX_TENSORS=<n> - reflects actual tensor limit

Note

To view the full list of the predefined Macros, compile with -E -dm.

Example on __TPC__:

#ifdef __TPC__ #define FLOAT32 #include “leakyrelu.h” #undef FLOAT32 #endif