Debugging Slow Convergence

This section provides suggested courses of action to take if your TensorFlow model converges slowly.

TensorBoard Usage

Visualization

The Intel® Gaudi® software can generate data representing HPU clusters to be visualized by TensorBoard. When TensorBoard visualization is enabled, the Intel Gaudi software adds a tag, post_optimization_graph, visualizing the clustered TF graph. Furthermore, if the environment variable GRAPH_VISUALIZATION=1, additional tags will be created for each Intel Gaudi Op cluster, visualizing the cluster’s pre and post Intel Gaudi software graphs with respect to the Graph Compiler’s graph compilation.

Trace Viewer

Profiling with TensorBoard is supported with the use of standard callbacks injected into the model. See Collect performance data. By default, the Intel Gaudi Profiling subsystem dumps events from the Gaudi device which will then be displayed in the trace generated by the TensorBoard Profiler. See Tensorboard profiling keras. However, only few iterations can be profiled in a single session since trace buffer size is limited to 128 MB. This will be addressed in subsequent releases.

Traces from HPU might be displayed incorrectly in TensorBoard. For better experience, see the instructions in Analysis section.

Model Graph

Set the following environment variables to generate a dump of the TensorFlow training graph:

$ export LOG_LEVEL_GRAPH_DATA=0 GRAPH_VISUALIZATION=1 HBN_TF_GRAPH_DUMP=2
$ # Train your model as usual

TensorFlow graphs will be written to the current directory.