Release Notes v1.5.0
On this Page
Release Notes v1.5.0¶
New Features and Enhancements - 1.5.0¶
The following documentation and packages correspond to the latest software release version from Habana: 1.5.0-610. We recommend using the latest release where possible to stay aligned with performance improvements and updated model coverage. Please refer to the Installation Guide for further details.
General Features¶
There is some degradation in training throughput due to software fixes/features in this release.
Centos 8 is no longer supported.
PYHLML GitHub repository is now obsolete. You can install the PYHLML wheel package provided either in the gaudi-python library located in the Habana Vault or pyhlml PyPI. See Habana Labs Python Management Library (PYHLML) API Reference for more details.
Added support for Kubernetes version 1.22.
TensorFlow¶
Added support for TensorFlow 2.9.1.
Upgraded TensorFlow 2.8.0 to 2.8.2. Going forward, this minor version of TensorFlow will continue to be supported for a period of one year. After that, support for this version will be deprecated, and we will select a minor version to replace this.
Removed support for TensorFlow 2.7.1.
Added saved checkpoints for pre-trained TensorFlow ResNet50 and BERT Large. For more details, see the model READMEs provided in the Model References GitHub page. Checkpoints can be found in the Habana Catalog page.
Enabled the following for Gaudi2. See Model References GitHub page. - ResNet50 on 8 cards - BERT-L (FT, PT) on 8 cards - ResNext101 on 1 and 8 cards - MaskRCNN on 1 and 8 cards
PyTorch¶
Upgraded PyTorch to v1.11.0.
Habana accelerator support has been upstreamed to PyTorch lightning v1.6 and integrated with grid.ai. PyTorch Lightning is no longer included with the Habana PyTorch package. This release of SynapseAI was validated with PyTorch Lightning v1.6.4. Further documentation can be found at PyTorch Lightning docs.
Incorporated updates to PyTorch models for ease of use. See Porting a Simple PyTorch Model to Gaudi for more information. If you have existing migrated models from previous SynapseAI releases (v1.4.1 and below), see PyTorch for the required updates.
mark_step()
must be added right afterloss.backward()
andoptimizer.step()
.Weight permutation in vision models is no longer required.
Importing
load_habana_module
is no longer required.
Added support for DeepSpeed on Gaudi. See DeepSpeed User Guide.
Added support for ZeroRedundancyOptimizer option of
parameters_as_bucket_view
. See Advanced Usage: ZeroRedundancyOptimizer with DDP.Made the following changes to
habana_frameworks.torch
package. See Python Package (habana_frameworks.torch) for more details. - Renamedimport torch_hpu
toimport habana_frameworks.torch.hpu as hthpu
. See hpu APIs. - Added Memory Stats APIs to allow users to see memory usage on Gaudi. See Memory Stats APIs. - Added Random Number Generator APIs. See Random Number Generator APIs.For PyTorch models using DDP,
import habana_frameworks.torch.core.hccl
has been renamed toimport habana_frameworks.torch.distributed.hccl
. Habana recommends updating your existing models.Enabled the following for first-gen Gaudi. See Model References GitHub page - Yolov5 on 1 and 8 cards - SSD + SSD with Habana Dataloader on 1 and 8 cards - DINO on 1 and 8 cards - DeepSpeed BERT-L and BERT-1.5B scaling up to 128 cards
Enabled the following for Gaudi2. See Model References GitHub page - ResNet50 on 8 cards - BERT-L (FT, PT) on 8 cards - ResNext101 on 1 and 8 cards -
Habana Qualification Library¶
Serdes Loopback test is no longer supported. You can use Serdes Base test as it supports the same functionality.
Known Issues and Limitations - 1.5.0¶
TensorFlow¶
When using TF dataset cache feature where the dataset size is large, setting hugepage for host memory may be required. Refer to SSD_ResNet34 Model Reference for instructions on setting hugepage.
Users need to convert models to TensorFlow2 if they are currently based on TensorFlow V1. TF XLA compiler option is currently not supported.
Control flow ops such as tf.cond and tf.while_loop are currently not supported on Gaudi and will fall back on CPU for execution.
DenseNet: Sporadic issues with training on 8 Gaudis may occur.
Eager Mode feature in TensorFlow2 is not supported and must be disabled to run TensorFlow models on Gaudi. To disable Eager mode, see Creating a TensorFlow Example.
Distributed training with tf.distribute is enabled only with HPUStrategy. Other TensorFlow built-in distribution strategies such as MirroredStrategy, MultiWorkerMirroredStrategy, CentralStorageStrategy, ParameterServerStrategy are not supported.
EFA installation on Habana’s containers includes OpenMPI 4.1.2 which does not recognize the CPU cores and threads properly in a KVM virtualized environment. To enable identifying CPU/Threads configuration, replace
mpirun
withmpirun --bind-to hwthread --map-by hwthread:PE=3
. This limitation is not applicable for AWS DL1 instances.ResNeXt101 Media Loading HW Acceleration has the following limitations: - Accuracy is lower 0.5-1% due to different crop algorithm - Evaluation can be done only once at the end of whole training
PyTorch¶
Incorporated updates to PyTorch models for ease of use. If you have existing migrated models from previous SynapseAI releases (v1.4.1 and below), follow the below steps. See Porting a Simple PyTorch Model to Gaudi for more information.
You must remove
permute_params
andpermute_momentum
from your training models, as weight permutation in vision models is no longer required.You must add
mark_step()
right afterloss.backward()
andoptimizer.step()
. Themark_step()
call at the end of ResNet50 training iteration was removed in distributed training to prevent performance impact when running with Host NICs. This will be fixed in a future release.Habana recommends removing
load_habana_module
for both single card and distributed training since importingload_habana_module
is no longer required.
Significant performance degradation with Unet2D and Unet3D models. This will be fixed in the next release.
channels_last mode is not supported and will be enabled in a future release.
Cross_entropy loss function (
torch.nn.functional.cross_entropy
ortorch.nn.CrossEntropyLoss
) has an accuracy issue in BF16 that will be fixed in a future release. The FP32 version should be used.Dynamic shapes are not supported and will be enabled in future releases.
For Transformer models, time to train is high due to evaluation phase.
Graphs displayed in TensorBoard have some minor limitations, eg. operator’s assigned device is displayed as “unknown device” when it is scheduled to HPU.
EFA installation on Habana’s containers includes OpenMPI 4.1.2 which does not recognize the CPU cores and threads properly in a KVM virtualized environment. To enable identifying CPU/Threads configuration, replace
mpirun
withmpirun --bind-to hwthread --map-by hwthread:PE=3
. This limitation is not applicable for AWS DL1 instances.HPU tensor strides might not match that of CPU as tensor storage is managed differently. Reference to tensor storage (such as torch.as_strided) should take into account the input tensor strides explicitly. It is recommended to use other view functions instead of torch.as_strided. For more details, see https://pytorch.org/docs/stable/tensor_view.html and https://pytorch.org/docs/stable/generated/torch.as_strided.html#torch.as_strided.
When module weights are shared among two or more layers, using PyTorch with Gaudi requires these weights to be shared after moving the model to the HPU device. For more details, refer to Weight Sharing.
Habana Communication Library¶
Single Process Multiple Device Support in HCCL: Since multiple processes are required for multi-node (cross chassis) scaling, it only supports one device per process mode so that users do not need to differentiate the inter-node and intra-node usage cases.