2. Release Notes

2.1. Support Matrix

The table below details the configurations and versions supported:



Gaudi Firmware


Gaudi SPI Firmware


Operating Systems



Amazon Linux 2





5.4.0 and above

4.15 and above

5.4.0 and above












2.5.0 and 2.4.1

2.2. New Features and Enhancements - 0.15.4

This release includes minor improvements and bug fixes for the Habana® Gaudi® HPU and the SynapseAI® Software platform. The following documentation and packages correspond to the software release version 0.15.4-75. It is recommended to use the corresponding 0.15.4 Docker images and Models from the Habana Model-References repository with this build.

The Model-References for this release have been updated by adding the option to set the appropriate PYTHON version according to the table below. The models in the Model References will refer only to these specific versions when run. For further details, refer to the Setup and Install GitHub page.

Supported Python Versions per OS


Python Version

Ubuntu 18.04

Python 3.7

Ubuntu 20.04

Python 3.8

Amazon Linux 2

Python 3.7

2.3. New Features and Enhancements - 0.15.3

2.3.1. HCL Configuration File

  • Updated HCL.json file usage by removing the need to specify the server type.

  • Updated scripts located in the Model References GitHub page by removing the need to specify the server type.

2.4. New Features and Enhancements - 0.15.2

2.4.1. General Features

During driver loading, the driver initializes all existing devices in parallel.

2.4.2. PyTorch

Removed SpaCy from PyTorch requirements files for both Python3.7 and Python3.8 to avoid pip compatibility errors.

2.5. New Features and Enhancements - 0.15.1

2.5.1. General Features

Updated copyright/license files.

2.5.2. PyTorch

Recovered performance drop in BERT, ResNet & ResNext observed in v0.15 release.

2.6. New Features and Enhancements - 0.15

2.6.1. General Features

  • Python3.7 and Python3.8 are supported for both TensorFlow and PyTorch. Python3.6 is no longer supported.

  • Added additional Trace Analyzer capabilities in Habana Labs Trace Viewer. See Profiler User Guide for further details.

2.6.2. TensorFlow

  • Added support for TensorFlow 2.4.1 and 2.5.0. TensorFlow 2.2.2 is no longer supported. In general, we plan to upgrade support to the latest two minor versions of the framework with each release.

  • Added graph visualization support in TensorBoard. See the Debugging Guide for usage, and review Known Issues and Limitations.

  • Added a new feature that attempts to delegate computations to CPU in case of failures during runtime. See the Delegating Computations to CPU section for more information.

  • [Beta] Added support for multi-worker distributed training using tf.distribute with HPUStrategy class. See Distributed Training with TensorFlow for details.

  • Enabled the option to provide user-specified configuration files for mixed precision training. See TensorFlow Mixed Precision Training on Gaudi section for details.

2.6.3. PyTorch

  • Currently in beta.

  • Added support for PyTorch v1.7.1. PyTorch v1.5 is no longer supported.

  • Only Eager mode and Lazy evaluation mode will be supported going forward. TorchScript graph mode will be deprecated in the next release.

  • Enabled lazy mode support for ResNet50, BERT and DLRM reference models. Enabled support for ResNext101 topology in the reference models.

  • Enabled support for mixed precision training with Habana Mixed Precision (HMP) package. See PyTorch Mixed Precision Training on Gaudi for details.

2.6.4. Kubernetes

Added support for Kubernetes with Gaudi device plugin, MPI Operator and Helm chart for ease of deployment. See Kubernetes User Guide for more information.

2.6.5. Hl_Qual

Added HBM Stress Plugin for stress testing based on memory transfers using DMA. See Qualification Library Guide for details.

2.7. Resolved Issues - 0.15.3

Fixed a Linux driver issue that required driver reload on certain servers after ^C.

2.8. Known Issues and Limitations - 0.15.4

2.9. Known Issues and Limitations - 0.15.1

2.9.1. Hl_Qual

Running the ResNet-50 training stress test plugin is currently not supported on Amazon Linux2. This will be fixed in a subsequent release.

2.10. Known Issues and Limitations - 0.15

2.10.1. TensorFlow

  • Users need to convert models to TensorFlow2 if they are currently based on TensorFlow V1. TF XLA compiler option is currently not supported.

  • Control flow ops such as tf.cond and tf.while_loop are currently not supported on Gaudi and will fall back on CPU for execution.

  • Eager Mode feature in TensorFlow2 is not supported and must be disabled to run TensorFlow models on Gaudi.

  • Dynamic shapes support is available with limited performance. It will be addressed in future releases.

  • Distributed Training with TensorFlow: Only HPUStrategy is supported for Gaudi. Other TensorFlow built-in distribution strategies such as MirroredStrategy, MultiWorkerMirroredStrategy, CentralStorageStrategy, ParameterServerStrategy are not supported. HPUStrategy is not supported on TensorFlow 2.5.

  • TensorBoard graph visualization is not supported for models trained with Keras or Estimator. Only TFv2 native APIs (trace_on/trace_export) are supported.

2.10.2. PyTorch

  • PyTorch support is under active development and available in beta.

  • PyTorch dataloader may consume a significant portion of the training time, impacting overall model performance.

  • Convolution weight ordering for vision models: Users will need to manually handle this by following the guidelines in the Gaudi Migration Guide (see Convolution Weight Ordering in PyTorch Habana Vision Topologies). This will be improved in subsequent releases.

  • Dynamic shapes are not supported and will be enabled in future releases.

  • Installing PyTorch packages on a bare metal machine or a virtual machine is not supported.

2.10.3. Habana Communication Library

  • Single Process Multiple Device Support in HCCL: Since multiple processes are required for multi-node (cross chassis) scaling, it only supports one device per process mode so that users do not need to differentiate the inter-node and intra-node usage cases.

  • COMM group support in HCCL: Each worker can be assigned, at most, to a single comm group.