Runtime Environment Variables

The following table describes runtime flags that are set in the environment to change the behavior as well as enable or disable some features.

Table 4 Runtime Environment Variables

Flag

Default

Description

Consumer

PT_HPU_LAZY_MODE

1

Controls Execution mode:

  • 0x0 - Torch.compile and Eager mode

  • 0x1 - Lazy mode

Intel Gaudi PyTorch Bridge

GRAPH_VISUALIZATION

False

Creates of graph visualization files. The output dump graphs are in ./.graph_dumps folder

Intel Gaudi software

PT_HPU_RECIPE_CACHE_CONFIG

Unset

Holds the configuration of recipe cache. Configuration is encoded as a comma separated list in the following format: ‘<RECIPE_CACHE_PATH>,<RECIPE_CACHE_DELETE>,<RECIPE_CACHE_SIZE_MB>’.

  • <RECIPE_CACHE_PATH> - Path (directory), where compiled graph recipes are stored to accelerate a scale up scenario. Only one process compiles the recipe, and other processes read it from disk. If unset (default), compiled graph recipes are not stored on disk (recipe disk caching disabled).

  • <RECIPE_CACHE_DELETE> - Bool flag (true/false). If set to True, the directory provided as <RECIPE_CACHE_PATH> will be cleared when the workload starts.

  • <RECIPE_CACHE_SIZE_MB> - Max size in MB of recipe cache directory. If size limit is reached then the oldest recipes (by creation time on file system) are removed by PyTorch Bridge.

Note: If a recipe cache is shared among a few processes (scale up), it must be stored on a local physical disk. Avoid using remote drives (such as NFS) where file locks are not supported, as it it may lead to instability and unpredictable behavior.

Intel Gaudi PyTorch Bridge

PT_HPU_MAX_COMPOUND_OP_SIZE

INT64_MAX

Limits internal graph size to specified number of opsReduces the lazy mode memory overhead. This will be improved in future releases.

Note: This may affect performance.

Intel Gaudi PyTorch Bridge

PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES

False

The dynamic shapes feature is disabled by default. If a model experiences excessive recompilations due to Dynamic Data or Ops, this variable can be set to enable the Intel Gaudi PyTorch bridge and graph compiler to automatically manage dynamic shapes in model scripts. The graphs will be automatically bucketed and padded into ranges to achieve a common size, reducing recompilations and and improving performance when working with dynamic workloads. To run with dynamic shapes handling enabled, set PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES=1 in the script or console.

Intel Gaudi PyTorch Bridge

HABANA_PGM_LRU_MAX

30000

If cache evictions cause performance degradation, increasing the cache size will increase performance. The default value is 30000. Note: Boost in performance may cause an increase in host memory consumption.

Intel Gaudi PyTorch Bridge

PT_HPU_LAZY_ACC_PAR_MODE

1

This flag turns on host time optimization of lazy ops accumulation. It offloads ops accumulation to a separate thread, thus reducing computation time of the main thread.

Intel Gaudi PyTorch Bridge

PT_HPU_METRICS_FILE

Unset

Path (file), where the collected metrics are stored. Metrics are stored in a file only when PT_HPU_METRICS_FILE flag is set.

Intel Gaudi PyTorch Bridge

PT_HPU_METRICS_DUMP_TRIGGER

process_exit

Once PT_HPU_METRICS_FILE flag is set to automatically dump the metrics, it triggers Intel Gaudi PyTorch Bridge to specify precisely when to store the metrics into the file.

Supported values:

  • process_exit - stores metrics in a file during exit process.

  • mark_step - stores metrics in a file during mark_step.

  • metric_change - stores the metrics modified during the execution of the training script.

Multiple triggers can be enabled together by separating them with a comma, for example: PT_HPU_METRICS_DUMP_TRIGGERS=process_exit,metric_change

Intel Gaudi PyTorch Bridge

PT_HPU_METRICS_FILE_FORMAT

json

Metrics file format. Both JSON and TEXT formats are supported:

  • json

  • text

Intel Gaudi PyTorch Bridge

PT_HPU_ENABLE_GENERIC_STREAM

True

Enables generic stream[2] which allows a user to submit different types operations in same user stream[1]. For a usage example of this flag, see Wav2Vec2 inference script.
[1] User stream: A queue of device work. The host (user) places the work in this queue and continues immediately. The device schedules the work in this queue when the resources are free.
[2] Generic stream: A user stream where all operations can be pushed to a stream irrespective of the type of operation (copy, compute, or collective operations).

Intel Gaudi PyTorch Bridge

PT_HPU_ENABLE_EAGER_CACHE

False

Temporary performance improvement option for torch.compile mode only.

Due to early adoption phase of torch.compile, there are still many Operations executed eagerly outside of graphs. Eager execution is less performant, and this option reduces the negative performance impact of it.

Intel Gaudi PyTorch Bridge

PT_HPU_EAGER_4_STAGE_PIPELINE_ENABLE

False

Accelerate Eager Mode by enabling multithreaded pipeline in operations processing:

  • False - standard single threaded processing

  • True - experimental 4 threads processing pipeline

Intel Gaudi PyTorch Bridge

PT_ENABLE_INT64_SUPPORT

False

Enables native support for tensors with INT64 datatype:

  • False - INT64 tensors are cast to INT32 on the device and all computations are done in lower precision

  • True - INT64 tensors are not cast to INT32 and computations are done in higher precision

Important: Not all ops support INT64. If an Op does not support INT64, implicit casts can be added, otherwise, CPU fallback might occur (with performance impact) or runtime error is thrown.

Limitation: This flag is supported for Gaudi 2 only with PT_HPU_LAZY_MODE=0.

Intel Gaudi PyTorch Bridge

The following table describes runtime flags that are set in the environment to obtain Intel Gaudi software and Intel Gaudi PyTorch bridge level logs.

Table 5 Runtime Environment Variables for Logging Mechanism

Flag

Default

Description

Consumer

PT_FORCED_TRACING_MASK

0

A Bitmask specifying components inside Intel Gaudi PyTorch Bridge module that are allowed to use profilers. Note that certain profilers may require additional environment variables to be set.

  • PT_DEVICE - 0x1

  • PT_KERNEL - 0x2

  • PT_BRIDGE - 0x4

  • PT_SYNHELPER - 0x8

  • PT_DISTRIBUTED - 0x10

  • PT_LAZY - 0x20

  • PT_TRACE - 0x40

  • PT_FALLBACK - 0x80

  • PT_STATS - 0x100

  • PT_TEST - 0x200

  • PT_DYNAMIC_SHAPE - 0x400

  • PT_DEVMEM - 0x800

  • PT_HABHELPER - 0x1000

  • PT_IRGRAPH - 0x2000

  • PT_VIEWTABLE - 0x4000

  • PT_REFINEMENT - 0x8000

  • PT_HOSTSTAT - 0x10000

  • PT_LAYOUTS - 0x20000

  • PT_PARALLEL_ACC - 0x40000

  • PT_LAZY_EAGER - 0x80000

  • PT_MEMLOG - 0x100000

  • PT_EXEC_THREAD - 0x200000

  • PT_EAGER - 0x400000

  • PT_RECIPE_STATS - 0x800000

Intel Gaudi PyTorch Bridge

ENABLE_CONSOLE

False

If set to true, enables printing Intel Gaudi software and Intel Gaudi PyTorch Bridge logs to the console.

Intel Gaudi software and Intel Gaudi PyTorch Bridge

LOG_LEVEL_ALL

5

Logging level from Intel Gaudi software, perf_lib and Intel Gaudi PyTorch Bridge.

  • 6 is no logs

  • 0 is verbose

By default, logs are placed either in the console (if ENABLE_CONSOLE=true) or under ~/.habana_logs/.

Intel Gaudi software and Intel Gaudi PyTorch Bridge

LOG_LEVEL_ALL_PT

5

Logging level for Intel Gaudi PyTorch Bridge.

  • 6 is no logs

  • 0 is verbose

Intel Gaudi PyTorch Bridge

LOG_LEVEL_PT_DEVICE

5

Logging level for PT_DEVICE component of Intel Gaudi PyTorch Bridge. If unset, LOG_LEVEL_ALL_PT will be used.

  • 6 is no logs

  • 0 is verbose

Intel Gaudi PyTorch Bridge

LOG_LEVEL_PT_KERNEL

5

Logging level for PT_KERNEL component of Intel Gaudi PyTorch Bridge. If unset, LOG_LEVEL_ALL_PT will be used.

  • 6 is no logs

  • 0 is verbose

Intel Gaudi PyTorch Bridge

LOG_LEVEL_PT_BRIDGE

5

Logging level for PT_BRIDGE component of Intel Gaudi PyTorch Bridge. If unset, LOG_LEVEL_ALL_PT will be used.

  • 6 is no logs

  • 0 is verbose

Intel Gaudi PyTorch Bridge

LOG_LEVEL_PT_SYNHELPER

5

Logging level for PT_SYNHELPER component of Intel Gaudi PyTorch Bridge. If unset, LOG_LEVEL_ALL_PT will be used.

  • 6 is no logs

  • 0 is verbose

Intel Gaudi PyTorch Bridge

LOG_LEVEL_PT_DISTRIBUTED

5

Logging level for PT_DISTRIBUTED component of Intel Gaudi PyTorch Bridge. If unset, LOG_LEVEL_ALL_PT will be used.

  • 6 is no logs

  • 0 is verbose

Intel Gaudi PyTorch Bridge

LOG_LEVEL_PT_LAZY

5

Logging level for PT_LAZY component of Intel Gaudi PyTorch Bridge. If unset, LOG_LEVEL_ALL_PT will be used.

  • 6 is no logs

  • 0 is verbose

Intel Gaudi PyTorch Bridge

LOG_LEVEL_PT_TRACE

0

Logging level for PT_TRACE component of Intel Gaudi PyTorch Bridge. If unset, LOG_LEVEL_ALL_PT will be used.

  • 6 is no logs

  • 0 is verbose

Intel Gaudi PyTorch Bridge

LOG_LEVEL_PT_FALLBACK

5

Logging level for PT_FALLBACK component of Intel Gaudi PyTorch Bridge. If unset, LOG_LEVEL_ALL_PT will be used.

  • 6 is no logs

  • 0 is verbose

Intel Gaudi PyTorch Bridge

LOG_LEVEL_PT_STATS

5

Logging level for PT_STATS component of Intel Gaudi PyTorch Bridge. If unset, LOG_LEVEL_ALL_PT will be used.

  • 6 is no logs

  • 0 is verbose

Intel Gaudi PyTorch Bridge

LOG_LEVEL_PT_TEST

5

Logging level for PT_TEST component of Intel Gaudi PyTorch Bridge. If unset, LOG_LEVEL_ALL_PT will be used.

  • 6 is no logs

  • 0 is verbose

Intel Gaudi PyTorch Bridge

LOG_LEVEL_PT_DYNAMIC_SHAPE

5

Logging level for PT_DYNAMIC_SHAPE component of Intel Gaudi PyTorch Bridge. If unset, LOG_LEVEL_ALL_PT will be used.

  • 6 is no logs

  • 0 is verbose

Intel Gaudi PyTorch Bridge

LOG_LEVEL_PT_DEVMEM

5

Logging level for PT_DEVMEM component of Intel Gaudi PyTorch Bridge. If unset, LOG_LEVEL_ALL_PT will be used.

  • 6 is no logs

  • 0 is verbose

Intel Gaudi PyTorch Bridge

LOG_LEVEL_PT_HABHELPER

5

Logging level for PT_HABHELPER component of Intel Gaudi PyTorch Bridge. If unset, LOG_LEVEL_ALL_PT will be used.

  • 6 is no logs

  • 0 is verbose

Intel Gaudi PyTorch Bridge

LOG_LEVEL_PT_IRGRAPH

5

Logging level for PT_IRGRAPH component of Intel Gaudi PyTorch Bridge. If unset, LOG_LEVEL_ALL_PT will be used.

  • 6 is no logs

  • 0 is verbose

Intel Gaudi PyTorch Bridge

LOG_LEVEL_PT_VIEWTABLE

5

Logging level for PT_VIEWTABLE component of Intel Gaudi PyTorch Bridge. If unset, LOG_LEVEL_ALL_PT will be used.

  • 6 is no logs

  • 0 is verbose

Intel Gaudi PyTorch Bridge

LOG_LEVEL_PT_REFINEMENT

5

Logging level for PT_REFINEMENT component of Intel Gaudi PyTorch Bridge. If unset, LOG_LEVEL_ALL_PT will be used.

  • 6 is no logs

  • 0 is verbose

Intel Gaudi PyTorch Bridge

LOG_LEVEL_PT_HOSTSTAT

5

Logging level for PT_HOSTSTAT component of Intel Gaudi PyTorch Bridge. If unset, LOG_LEVEL_ALL_PT will be used.

  • 6 is no logs

  • 0 is verbose

Intel Gaudi PyTorch Bridge

LOG_LEVEL_PT_LAYOUTS

5

Logging level for PT_LAYOUTS component of Intel Gaudi PyTorch Bridge. If unset, LOG_LEVEL_ALL_PT will be used.

  • 6 is no logs

  • 0 is verbose

Intel Gaudi PyTorch Bridge

LOG_LEVEL_PT_PARALLEL_ACC

5

Logging level for PT_PARALLEL_ACC component of Intel Gaudi PyTorch Bridge. If unset, LOG_LEVEL_ALL_PT will be used.

  • 6 is no logs

  • 0 is verbose

Intel Gaudi PyTorch Bridge

LOG_LEVEL_PT_LAZY_EAGER

5

Logging level for PT_LAZY_EAGER component of Intel Gaudi PyTorch Bridge. If unset, LOG_LEVEL_ALL_PT will be used.

  • 6 is no logs

  • 0 is verbose

Intel Gaudi PyTorch Bridge

LOG_LEVEL_PT_MEMLOG

5

Logging level for PT_MEMLOG component of Intel Gaudi PyTorch Bridge. If unset, LOG_LEVEL_ALL_PT will be used.

  • 6 is no logs

  • 0 is verbose

Intel Gaudi PyTorch Bridge

LOG_LEVEL_PT_EXEC_THREAD

5

Logging level for PT_EXEC_THREAD component of Intel Gaudi PyTorch Bridge. If unset, LOG_LEVEL_ALL_PT will be used.

  • 6 is no logs

  • 0 is verbose

Intel Gaudi PyTorch Bridge

LOG_LEVEL_PT_EAGER

5

Logging level for PT_EAGER component of Intel Gaudi PyTorch Bridge. If unset, LOG_LEVEL_ALL_PT will be used.

  • 6 is no logs

  • 0 is verbose

Intel Gaudi PyTorch Bridge

PT_RECIPE_STATS

5

Logging level for PT_RECIPE_STATS component of Intel Gaudi PyTorch Bridge. If unset, LOG_LEVEL_ALL_PT will be used.

  • 6 is no logs

  • 0 is verbose

Intel Gaudi PyTorch Bridge