2. Release Notes

2.1. Support Matrix

The table below details the configurations and versions supported:


1.0.0 / 1.0.1

Gaudi Firmware


Gaudi SPI Firmware


Operating Systems



Amazon Linux 2





5.4.0 and above

4.15 and above

5.4.0 and above










For 1.0.0: 1.8.1

For 1.0.1: 1.8.2


For 1.0.0: 2.5.0

For 1.0.1: 2.5.1 and 2.6.0

2.2. New Features and Enhancements - 1.0.1

2.2.1. TensorFlow

  • Added support for TensorFlow 2.6.0.

  • Upgraded TensorFlow 2.5.0 to 2.5.1.

  • This release includes support for Multi-node training over host NICs when using HCCL API via MPI.

2.2.2. PyTorch

  • Upgraded PyTorch 1.8.1 to 1.8.2.

  • Improved PyTorch dataloader performance.

2.3. New Features and Enhancements - 1.0

2.3.1. General Features

  • In addition to the habanalabs driver, a separate Ethernet driver - habanalabs_en - is now provided in the habanalabs-dkms_all package.

  • The Model-References for this release have been updated by adding the option to set the appropriate PYTHON version according to the table below. The models in the Model References will refer only to these specific versions when run. For further details, refer to the Setup and Install GitHub page.

    Supported Python Versions per OS


    Python Version

    Ubuntu 18.04

    Python 3.7

    Ubuntu 20.04

    Python 3.8

    Amazon Linux 2

    Python 3.7

2.3.2. TensorFlow

  • Only TensorFlow 2.5.0 is supported in this release.

  • Moved Python code responsible for initializing TensorFlow on Gaudi to habana-tensorflow Python package. TensorFlow models in Model References repository are updated accordingly.

  • Extended support for TensorFlow Operators (including Keras Operators).

  • Enabled reference topologies on TensorFlow 2.5: Unet3D, CycleGan, T5-base, Densenet, Transformer, EfficientDet, RetinaNet and SegNet. The models are available in the Model References repository.

  • TensorBoard graph visualization is now supported for models trained with Keras or Estimator.

  • Enabled support for tf.keras.mixed_precision.

2.3.3. PyTorch

  • Added support for PyTorch v1.8.1. PyTorch v1.7.1 and v1.5 are no longer supported.

  • Only Eager mode and Lazy evaluation mode are supported. TorchScript graph mode is no longer supported.

  • Upgraded Huggingface Transformers to v4.8.2.

2.3.4. Graph Compiler

  • Improved device memory consumption and device performance.

  • Improved compilation time on host.

2.3.5. Habana Commnication Library

  • Improved BW performance for message sizes larger than 8MB.

  • Added HCCL support for TCP communication between hosts.

2.4. Known Issues and Limitations - 1.0.1

2.4.1. TensorFlow

TensorFlow Model Garden topologies are supported on TensorFlow 2.6.0 except for CycleGan. This will be adressed in a subsequent release. The models are available in the Model References repository.

2.5. Known Issues and Limitations - 1.0

2.5.1. TensorFlow

  • HCCL collective operations hcclSend and hcclRecv are no longer capable of communicating across machines using Host NICs via OpenMPI. This feature will be restored in future releases.

  • Users need to convert models to TensorFlow2 if they are currently based on TensorFlow V1. TF XLA compiler option is currently not supported.

  • Control flow ops such as tf.cond and tf.while_loop are currently not supported on Gaudi and will fall back on CPU for execution.

  • Eager Mode feature in TensorFlow2 is not supported and must be disabled to run TensorFlow models on Gaudi.

  • Dynamic shapes support is available with limited performance. It will be addressed in future releases.

  • When running workloads on TensorFlow Distributed HPUStrategy, and after rolling back the stack due to a successful training, a sporadic issue occurs with one of the ranks exiting non-zero code. This issue will be addressed in future releases.

  • Multi-node training over host NICs when using HCCL API and MPI is currently not supported. Use HCL API instead by setting the environment variable HABANA_NCCL_COMM_API=0 or via TCP communication as detailed in Scale-Out via Host-NIC over TCP. This issue will be addressed in a future release.

  • Distributed training with tf.distribute is enabled only with HPUStrategy. Other TensorFlow built-in distribution strategies such as MirroredStrategy, MultiWorkerMirroredStrategy, CentralStorageStrategy, ParameterServerStrategy are not supported.

  • tf.distribute HPUStrategy does not support multi-node training over host NIC.

2.5.2. PyTorch

  • PyTorch dataloader may consume a significant portion of the training time, impacting overall model performance.

  • Convolution weight ordering for vision models: Users will need to manually handle this by following the guidelines in the Gaudi Migration Guide (see Convolution Weight Ordering in PyTorch Habana Vision Topologies). This will be improved in subsequent releases.

  • Dynamic shapes are not supported and will be enabled in future releases.

  • Installing PyTorch packages on a bare metal machine or a virtual machine is not supported.

  • BERT-L Finetuning: Graph compilation time has an overall impact on time to train. This will be fixed in a subsequent release

2.5.3. Habana Communication Library

  • Single Process Multiple Device Support in HCCL: Since multiple processes are required for multi-node (cross chassis) scaling, it only supports one device per process mode so that users do not need to differentiate the inter-node and intra-node usage cases.

  • COMM group support in HCCL: Each worker can be assigned, at most, to a single comm group.