Runtime Environment Variables
Runtime Environment Variables¶
The following table describes runtime environment variables that are set to change the behavior as well as enable or disable
some features. Among the below flags, TF_NUM_INTEROP_THREADS
, TF_CPP_MIN_LOG_LEVEL
, and TF_CPP_MIN_VLOG_LEVEL
are
native TensorFlow flags. All other flags are SynapseAI specific.
Flag |
Default |
Description |
---|---|---|
|
Unset |
Accepts a comma-separated list
of op types to be placed on the CPU
by |
|
1 |
Controls dumping of TensorFlow graphs after different graph transformation phases.
|
|
Unset |
Sets the path that TensorFlow dumps are saved to. If unset, graphs will not be dumped. A warning message is shown for built-in TF graph dumping. |
|
false |
If set to ‘true’, enables printing Synapse_AI logs console. |
|
5 |
Logging level from SynapseAI and perf_lib.
By default, logs are placed either in the
console (if |
|
Unset |
Enables FP32 to BF16 conversion pass for mixed precision training. Currently supported settings:
|
|
Unset |
Allows dumping current BF16 config to JSON file. If set to an absolute path output file, data is dumped. |
|
True |
If set to ‘false’ or ‘0’ support for compiling clusters with dynamic shape inputs is disabled. Dynamic shapes are understood as inputs to the clusters, that are changing between iterations, forcing recompilations of graph recipes. Support of dynamic shapes reduces required number of recompilations by introducing graph recipes with input shape ranges (min,max) based on patterns of input shapes to a given cluster. Disabling dynamic shapes support forces all clusters to be compiled statically, which can increase overall number of compilations. |
|
Unset |
If set to ‘0’, Pattern Matcher optimization pass is disabled. |
|
Unset |
Allows setting initial allocated memory size for workspace buffer in MB. That option is mainly for cases in which dynamic workspace allocation does not work properly. |
|
Unset |
Default allocation strategy which allocates host memory with the below minimum values:
If this flag is set to any value, it instructs Habana CPU allocator to override the default configuration of the CPU memory pool size with the given size in Gigabytes. |
|
Unset |
If set to a non-zero value, this flag enforces the thread count for TensorFlow op execution. Otherwise, TensorFlow selects the count based on the available cores and MKL/OpenMP configurations. |
|
0 |
Logging level from native TensorFlow. Lower value means more logs. Valid values range is [0-4]. |
|
0 |
Another logging level from native TensorFlow. Higher value means more logs. Valid value range is [0-10]. |
|
False |
If set to ‘True’, disables legacy Variables registration on HPU and allows them to be executed on CPU. Otherwise, legacy variables registration on HPU will prevent them from being executed at all. |
|
Unset |
Path (directory), where compiled graph recipes are stored between different runs of the same model (accelerates time of first iteration). If unset, compiled graph recipes are not stored on disk (recipe disk caching disabled). In a scale up scenario, different processes on one platform may share the same directory for recipe cache. Only one process compiles the recipe, and other processes read it from disk. Note: Recipe cache dir is not cleared automatically and can increase in size over time. Note: If a recipe cache is shared among a few processes (scale up), it must be stored on a local physical disk. Avoid using remote drives (such as NFS) where file locks are not supported, as it it may lead to instability and unpredictable behavior. |
|
False |
If set to ‘True’ in a multi-worker training,
an instantiation of The barrier effectively aligns multiple worker processes prior to every
This specific mode of operation addresses scenarios where
the iteration time of every worker varies greatly.
In such cases, calls to the Habana Communication Library
(specifically: This feature does not affect Horovod’s
|
|
Unset |
Limit of nodes number inserted into a single graph recipe. If unset, graphs do not have any upper limit of nodes and are sliced only in algorithmic synchronization points (similar to sending tensors between devices). If set, big graphs will be sliced to smaller ones with maximal number of nodes determined by the variable value. Note: The |