Runtime Environment Variables¶

The following table describes runtime flags that are set in the environment to change the behavior as well as enable or disable some features.

Table 5 Runtime Environment Variables¶
Flag	Default	Description	Consumer
`PT_HPU_LAZY_MODE`	0	Controls execution mode: 0x0 - `torch.compile` and Eager mode 0x1 - Lazy mode	Intel Gaudi PyTorch bridge
`GRAPH_VISUALIZATION`	False	Creates of graph visualization files. The output dump graphs are in ./.graph_dumps folder	Intel Gaudi software
`PT_HPU_RECIPE_CACHE_CONFIG`	,false,1024,false	Holds the configuration of recipe cache. Configuration is encoded as a comma separated list in the following format: `<RECIPE_CACHE_PATH>,<RECIPE_CACHE_DELETE>,<RECIPE_CACHE_SIZE_MB>,<RECIPE_CACHE_ON_NFS>`. `<RECIPE_CACHE_PATH>` - Path (directory), where compiled graph recipes are stored to accelerate a scale-up scenario. Only one process compiles the recipe, and other processes read it from disk. If unset (default), compiled graph recipes are not stored on disk (recipe disk caching disabled). `<RECIPE_CACHE_DELETE>` - Bool flag (true/false). If set to True, the directory provided as `<RECIPE_CACHE_PATH>` will be cleared when the workload starts. `<RECIPE_CACHE_SIZE_MB>` - Max size in MB of recipe cache directory. If size limit is reached then the oldest recipes (by creation time on file system) are removed by PyTorch bridge. `<RECIPE_CACHE_ON_NFS>` - Bool flag (true/false). If set to True, the directory provided as `<RECIPE_CACHE_PATH>` can be on NFS. Also, 2nd and 3rd param are ignored, as no retention/deletion policy can be safely applied on NFS. Note: Due to lack of synchronization on NFS, the cache folder can have redundant files produced by different workers, which queried for the same cache entries simultaneously. Note: If a recipe cache is shared among a few processes (scale-up), it can be stored on either local or remote disk. Using remote drives (such as NFS), requires setting `<RECIPE_CACHE_ON_NFS>`.	Intel Gaudi PyTorch bridge
`PT_HPU_RECIPE_CACHE_NFS_TIMEOUT_S`	300	Option to set timeout for a given process to wait for cache file produced by another worker. Recipe cache on NFS cannot rely on file locks, so a predefined timeout is needed to avoid waiting indefinitely for a given cache file from another process. Option to define recipe cache path on NFS is the 4th param in PT_HPU_RECIPE_CACHE_CONFIG.	Intel Gaudi PyTorch bridge
`PT_HPU_MAX_COMPOUND_OP_SIZE`	INT64_MAX	Limits internal graph size to specified number of ops. Reduces the Lazy mode memory overhead. This will be improved in future releases. Note: This may affect performance.	Intel Gaudi PyTorch bridge
`PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES`	False	The dynamic shapes feature is disabled by default. If a model experiences excessive recompilations due to dynamic data or ops, this variable can be set to enable the Intel Gaudi PyTorch bridge and graph compiler to automatically manage dynamic shapes in model scripts. The graphs will be automatically bucketed and padded into ranges to achieve a common size, reducing recompilations and and improving performance when working with dynamic workloads. To run with dynamic shapes handling enabled, set `PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES=1` in the script or console.	Intel Gaudi PyTorch bridge
`HABANA_PGM_LRU_MAX`	30000	If cache evictions cause performance degradation, increasing the cache size will increase performance. The default value is 30000. Note: Boost in performance may cause an increase in host memory consumption.	Intel Gaudi PyTorch bridge
`PT_HPU_LAZY_ACC_PAR_MODE`	1	This flag turns on host time optimization of lazy ops accumulation. It offloads ops accumulation to a separate thread, thus reducing computation time of the main thread.	Intel Gaudi PyTorch bridge
`PT_HPU_METRICS_FILE`	Unset	Path (file), where the collected metrics are stored. Metrics are stored in a file only when `PT_HPU_METRICS_FILE` flag is set, and they are flushed upon the exit process.	Intel Gaudi PyTorch bridge
`PT_HPU_METRICS_FILE_FORMAT`	json	Metrics file format. Both JSON and TEXT formats are supported: `json` `text`	Intel Gaudi PyTorch bridge
`PT_HPU_ENABLE_GENERIC_STREAM`	True	Enables generic stream^[1] which allows a user to submit different types operations in same user stream^[2]. For a usage example of this flag, see Wav2Vec2 inference script. [1] Generic stream: A user stream where all operations can be pushed to a stream irrespective of the type of operation (copy, compute, or collective operations). [2] User stream: A queue of device work. The host (user) places the work in this queue and continues immediately. The device schedules the work in this queue when the resources are free.	Intel Gaudi PyTorch bridge
`PT_HPU_EAGER_PIPELINE_ENABLE`	True	Accelerate Eager mode by enabling multithreaded pipeline in operations processing: False - standard single threaded processing True - multiple threads processing pipeline	Intel Gaudi PyTorch bridge
`PT_COMPILE_ONLY_MODE`	False	Option to disable launching any computations on the hardware. It can be useful in a warmup phase of a given workload, when user wants to compile all the graphs and have a well-defined separation before starting actual execution on the hardware.	Intel Gaudi PyTorch bridge
`PT_HPU_GPU_MIGRATION`	False	Enables GPU Migration Toolkit which simplifies migrating PyTorch models that run on GPU-based architecture to run on Intel Gaudi AI accelerator: False - GPU Migration disabled True - GPU Migration enabled	Intel Gaudi PyTorch bridge
`PT_HPU_ENABLE_LAZY_COLLECTIVES`	False	Captures the collectives in Lazy IR. This is useful when multi-card support for inference is used with HPU Graphs. False - Does not capture the collectives in lazy IR. True - Captures the collectives in Lazy IR.	Intel Gaudi PyTorch bridge
`PT_HPU_LAZY_COLLECTIVES_HOLD_TENSORS`	False	Applicable only when using PT_HPU_ENABLE_LAZY_COLLECTIVES=1. False - Tensor references are released immediately after computation/collective is completed. Models requiring high memory should use this value. True - Tensor references are held for a longer time until the respective computation and collective are completed. This provides better performance but is memory-intensive. This may cause an OOM error for some models.	Intel Gaudi PyTorch bridge
`PT_HPU_COMPILE_THREAD_POOL_SIZE`	8	Sets the number of threads for Compile thread pool which is used in Eager and Compile mode. If set to 0 or more than available CPUs, the number of threads will be equal to the number of available CPUs.	Intel Gaudi PyTorch bridge
`PT_HPU_ENABLE_SYNAPSE_OUTPUT_PERMUTE`	True	Option to disable internal permutation handling for output tensors. Temporary WA needed to enable Parallel Compilation in Compile mode - see Compile Mode.	Intel Gaudi PyTorch bridge
`PT_HPU_HUGE_PAGES_LIMIT_MB`	0	Sets the huge page limit (in MB) for the current worker. If set to 0, the default minimum of 2MB applies.	Intel Gaudi PyTorch bridge
`PT_TE_CUSTOM_OP`	False	Controls enabling of an experimental custom_op-based solution for efficient operation of Intel Transformer Engine modules in torch.compile mode.	Intel Transformer Engine

The following table describes runtime flags that are set in the environment to obtain Intel Gaudi software and Intel Gaudi PyTorch bridge level logs.

Note

For the full list of logging levels, refer to Using Log Levels.
For the full list of Intel Gaudi software component-level logs, refer to Using Component-level Logs.

Table 6 Runtime Environment Variables for Logging Mechanism¶
Flag	Default	Description	Consumer
`HABANA_LOGS`	`~/.habana_logs/`	Sets log files location.	Intel Gaudi software and Intel Gaudi PyTorch bridge
`ENABLE_CONSOLE`	False	If set to True, enables printing Intel Gaudi software and Intel Gaudi PyTorch bridge logs to the console. If unset, logs are output in the directory specified by `HABANA_LOGS`.	Intel Gaudi software and Intel Gaudi PyTorch bridge
`LOG_LEVEL_ALL`	5	Logging level for Intel Gaudi software components and Intel Gaudi PyTorch bridge.	Intel Gaudi software and Intel Gaudi PyTorch bridge
`LOG_LEVEL_ALL_PT`	5	Logging level for Intel Gaudi PyTorch bridge. If unset, `LOG_LEVEL_ALL` will be used.	Intel Gaudi PyTorch bridge
`LOG_LEVEL_<COMPONENT>`	5	Logging level for `<COMPONENT>` in Intel Gaudi PyTorch bridge. If unset, `LOG_LEVEL_ALL_PT` will be used. There are many different components logging in PyTorch bridge. Notable mentions are: `PT_EAGER` - C++ layer logs of Eager and Compile modes `PT_PYTHON` - Python layer logs of Compile mode (`hpu_backend` specific logs) `PT_LAZY` - C++ layer logs of Lazy mode `LOG_LEVEL_PT_FALLBACK` - C++ layer logs of OP fallbacks to CPU (performance penalty) There are more components, but they should not be critical for end user to work with in most cases.	Intel Gaudi PyTorch bridge
`GPU_MIGRATION_LOG_LEVEL`		Logging level for `PT_HPU_GPU_MIGRATION` component of Intel Gaudi PyTorch bridge: 1 - Logs all modules and prints to the console. 2 - Logs all modules. 3 - Logs all modules excluding torch.	Intel Gaudi PyTorch bridge

Gaudi Documentation 1.22.1 documentation

Runtime Environment Variables

Runtime Environment Variables¶