Porting a Simple TensorFlow Model to Gaudi
On this Page
Porting a Simple TensorFlow Model to Gaudi¶
To set up the TensorFlow environment, refer to the Installation Guide. The supported TensorFlow versions are listed in the Support Matrix.
Note
Using APIs from different TensorFlow versions can cause compatibility issues. Please refer to the TensorFlow Known Issues and Limitations section for a list of current limitations.
Creating a TensorFlow Example¶
In order to run the following example, run the Docker image in interactive mode on the Gaudi machine according to the instructions detailed in the Installation Guide.
After entering a Docker shell, create a “example.py” TensorFlow example with the following code snippet available in the TensorFlow Hello World Example.
The example.py
presents a basic TensorFlow code example. The following further explains the Habana-specific lines:
Line 2- Import function to enable a single Gaudi device.
Note
Ensure that you have a proper PYTHONPATH set by checking if it consists of /root
, or more specifically, export PYTHONPATH=/root/Model-References:/usr/lib/habanalabs:$PYTHONPATH
Line 4 - The function imported earlier is called to enable Gaudi (registered as ‘HPU’ device in TensorFlow), Habana optimization passes, Habana ops and so on.
Additional Migration Examples¶
In addition to the above, two migration examples are provided:
example_tf_func.py
with tf.functionexample_tf_session.py
with tf.Session (originally from TF1.15 found in TensorFlow Hello World Example).
For example_tf_func.py
, the migration instructions are similar to the Hello_world.py
example detailed above.
For example_tf_session.py
, you must disable eager mode by adding tf.compat.v1.disable_eager_execution()
to enable graph mode.
The table below summarizes the conditions that recommend tf.compat.v1.disable_eager_execution()
to be added in the model scripts to enable graph mode:
TF version and API |
Recommendations for disable_eager_execution |
Code Examples in GitHub |
---|---|---|
TF1 scripts running in TF2 compatible mode |
tf.compat.v1.disable_eager_execution() is required to enable graph mode. |
TF1 model running in TF2 compatible mode: example_tf_session.py |
TF2 scripts running with keras model (graph mode by default) |
tf.compat.v1.disable_eager_execution() is NOT required to enable graph mode. |
TF2 model running with Keras on single Gaudi: example.py TF2 model running with Keras on Horovod based Multi Gaudis: example_hvd.py |
TF2 scripts runs with tf.function (graph mode) |
tf.compat.v1.disable_eager_execution() is NOT required to enable graph mode. |
TF2 model running with tf.function on single Gaudi: example_tf_func.py TF2 model running with tf.function on Horovod based Multi Gaudis: example_tf_func_hvd.py |
Executing the Example¶
After creating the example.py
, execute the example by running:
You can also run the above example with BF16 support
enabled by adding the TF_BF16_CONVERSION=1
flag. For a full list of available runtime environment variables, see Runtime Environment Variables.
The following lines should appear as part of output:
Epoch 1/5
469/469 [==============================] - 1s 3ms/step - loss: 1.2647 - accuracy: 0.7208
Epoch 2/5
469/469 [==============================] - 1s 2ms/step - loss: 0.7113 - accuracy: 0.8433
Epoch 3/5
469/469 [==============================] - 1s 2ms/step - loss: 0.5845 - accuracy: 0.8606
Epoch 4/5
469/469 [==============================] - 1s 2ms/step - loss: 0.5237 - accuracy: 0.8688
Epoch 5/5
469/469 [==============================] - 1s 2ms/step - loss: 0.4865 - accuracy: 0.8749
313/313 [==============================] - 1s 2ms/step - loss: 0.4482 - accuracy: 0.8869
Since the first iteration includes graph compilation time, you can see the first iteration takes longer to run than later iterations. The software stack compiles the graph and saves the recipe to cache. Unless the graph changes or a new graph comes in, no recompilation is needed during the training. Typical graph compilation happens at the beginning of the training and at the beginning of the evaluation.
Viewing Loss and Accuracy in TensorFlow¶
You can find loss and accuracy in the demo scripts output. Loss and accuracy metrics can be visualized using different Profiler tools. For further details about the Profiler tools you can use, see Analysis section.
Loading the Habana Module¶
To load the Habana Module for TensorFlow, you need to call load_habana_module()
located under library_loader.py
.
This function loads the Habana libraries needed in order to use Gaudi HPU at the TensorFlow level.
Once loaded, Gaudi HPU is registered in TensorFlow and prioritized over CPU.
This means, when a given Op is available for both CPU and the Gaudi HPU, the Op is assigned to the Gaudi HPU.
Habana op support and custom TensorFlow ops are defined in the habana_ops
object, also available in habana-tensorflow.
It can be imported as such: from habana_frameworks.tensorflow import habana_ops
, but should only be used after load_habana_module()
is called.
The custom ops are used for pattern matching vanilla TensorFlow ops.
load_habana_module()
accepts an optional parameter allow_op_override
where load_habana_module(allow_op_override=True)
is the default.
It allows replacement of a default TensorFlow op implementation with a custom one to improve performance.
Only tf.keras.layers.LayerNormalization
is currently supported.
A known issue in TensorFlow may require you
to disable allow_op_override
by setting it to load_habana_module(allow_op_override=False)
.
Enabling a Single Gaudi Device¶
To enable a single Gaudi device, add the below code to the main function:
To enable Horovod for multi-Gaudi runs, add distributed functions to the main function.
To enable multi-worker training with tf.distribute
, use HPUStrategy
class.
For more details on porting multi-node models, see Distributed Training with TensorFlow.