2. Release Notes

2.1. Support Matrix

The table below details the configurations and versions supported:



Gaudi Firmware


Gaudi SPI Firmware


Operating Systems



Amazon Linux 2









5.4.0 and above

4.15 and above

5.4.0 and above
















2.5.1 and 2.6.0

Habana Horovod

Forked from v0.22.1 of the official Horovod

2.2. New Features and Enhancements - 1.1.0

The following documentation and packages correspond to the latest software release version from Habana: 1.1.0-614. We recommend using the latest release where possible to stay aligned with performance improvements and updated model coverage. Please refer to the Setup and Install Guide for details.

2.2.1. TensorFlow

  • Added support for 26 additional operators. See TensorFlow Operators for the full list of operators.

  • Implemented defragmentation mechanism in case of OOM on the HPU.

  • Added graph splitting in case of CPU fallback.

  • Updated Habana Horovod - forked from v0.22.1.

  • TensorFlow Distributed with HPUStrategy now uses HCCL API by default. Hence, it offers TCP/IP based host NIC scaling out capabilities. See Scale-Out via Host-NIC over TCP for further details.

2.2.2. PyTorch

  • Upgraded PyTorch 1.8.2 to 1.9.1.

  • Improved ResNet50, ResNet152, ResNext101 performance with Habana dataloader. For more information, please refer to the Habana Data Loader section.

  • Added new models: Unet2D, Transformer, ResNet152, DistilBERT, RoBERTa Base and Large.

  • Enabled BERT-L on 32 Gaudi.

  • Enabled ResNet50 on 16 Gaudi with HCCL over TCP.

  • Added PyTorch Lightning v1.4.8 support. You can find an example of PyTorch Lightning support in Unet2D model under Pytorch models repository.

  • Installing PyTorch packages on a bare metal machine or a virtual machine is now supported. For more information, refer to the PyTorch Installation section.

2.2.3. Driver

The habanalabs driver will now automatically bring up the internal ports. Using manage_network_ifs.sh link up script to bring up the internal ports will be deprecated.

2.2.4. Firmware

This release includes v1.1.0 FW for both host side and SPI. The new FW includes minor improvements.

This release is both forward and backward compatible. That is, 1.1.0 host side FW is compatible with 0.14.10 SPI firmware, and 0.14.10 host side FW is compatible with 1.1.0 SPI.


Upgrading the SPI firmware is required for Supermicro X12 Server. Other platforms do not require SPI upgrade.

2.2.5. New Operating Systems

Added support for the following Operating Systems:

  • RHEL 8.3

  • Centos 8.3

2.3. Known Issues and Limitations - 1.1.0

2.3.1. TensorFlow

  • When using TF dataset cache feature where the dataset size is large, setting hugepage for host memory may be required. Refer to SSD_ResNet34 Model Reference for instructions on setting hugepage.

  • RetinaNet model eval phase is limited to CPU. This will be addressed in a subsequent release.

  • Users need to convert models to TensorFlow2 if they are currently based on TensorFlow V1. TF XLA compiler option is currently not supported.

  • Control flow ops such as tf.cond and tf.while_loop are currently not supported on Gaudi and will fall back on CPU for execution.

  • Eager Mode feature in TensorFlow2 is not supported and must be disabled to run TensorFlow models on Gaudi.

  • Performance is limited in workloads with TopK Op having inputs with dynamic shapes. It will be addressed in future releases.

  • Distributed training with tf.distribute is enabled only with HPUStrategy. Other TensorFlow built-in distribution strategies such as MirroredStrategy, MultiWorkerMirroredStrategy, CentralStorageStrategy, ParameterServerStrategy are not supported.

  • During BERT FT training on 8 cards, sporadic graph re-compilations was observed which may cause iteration’s throughput fluctuations. Overall Time-to-train (TTT) is not impacted. The issue will be fixed in a subsequent release.

  • Performance on T5 Base is ~20-25% lower on TF v2.6.0 compared to TF v2.5.1.

  • Transformer training reaches assumed performance and accuracy. Inference performance is sub-optimal. This issue will be fixed in a subsequent release.

2.3.2. PyTorch

  • Convolution weight ordering for vision models: Users will need to manually handle this by following the guidelines in the Gaudi Migration Guide (see Convolution Weight Ordering in PyTorch Habana Vision Topologies). This will be improved in subsequent releases.

  • Dynamic shapes are not supported and will be enabled in future releases.

  • Distributed Communication with PyTorch: HCCL is supported with ResNet50, ResNext101, Resnet152 only.

  • Distributed Communication with PyTorch: Host NIC based scale-out with HCCL is supported with ResNet50. However, the performance with Host NIC is lower than the performance with Gaudi NICs. This will be addressed in subsequent releases.

  • Transformer model with 8 Gaudis has a known issue when using a dataset is from seq2seq GitHub project. This will be fixed in subsequent releases.

2.3.3. Habana Communication Library

  • Single Process Multiple Device Support in HCCL: Since multiple processes are required for multi-node (cross chassis) scaling, it only supports one device per process mode so that users do not need to differentiate the inter-node and intra-node usage cases.

  • COMM group support in HCCL: Each worker can be assigned, at most, to a single comm group.


For previous versions of the Release Notes, please refer to: