Troubleshooting your Model

This section provides troubleshooting instructions that can be referred to for common issues when training TensorFlow models.

Runtime Errors

The following table outlines possible runtime errors:

Error

Description

Workaround

model/tf.nn.e lu/Elu/elu_fwd_f32_n398: TPC kernel with guid “elu_fwd_f32” doesn’t support DS

One of operators (Elu in the example) implementation does not support dynamic shapes

Set TF_ENA BLE_DYNAMIC_SHAPES=False

INFO:tensorflow:Error reported to Coordinator: <class ‘tensorfl ow.python.framework.erro rs_impl.InternalError’>, Graph execution error:

Old node tow er_0/v/gpu_cached_inputs is not mapped

Model uses legacy variables (TensorFlow 1.x) which are not supported by Intel Gaudi stack. Recommended script upgrade to use TensorFlow2 resource variables.

Place legacy variables on CPU, significantly reducing performance by setting TF_HABANA_ALLOW _LEGACY_VARIABLES _ON_CPU=true

Performance Issues

For details on how to get best performance on HPU, refer to Model Performance Optimization Guide for TensorFlow.