Release Notes v1.4.0

New Features and Enhancements - 1.4.0

The following documentation and packages correspond to the latest software release version from Habana: 1.4.0-442. We recommend using the latest release where possible to stay aligned with performance improvements and updated model coverage. Please refer to the Installation Guide for further details.

General Features

  • Added EFA support:

    • All AMIs are now pre-installed with EFA related packages except for Centos8.

    • TensorFlow and PyTorch container images are now pre-installed with EFA related packages including libFabric and the latest OpenMPI version 4.1.2.

  • Added VMware Tanzu support to enable orchestrating deep learning workloads at scale. For more details, see VMware Tanzu Guide.

  • Dropped 1.18 for Kubernetes.

  • Habana now uses the standard MPI Operator from Kubeflow that enables the running of MPI all reduce style workloads in Kubernetes.

TensorFlow

  • Fixed description formatting of Habana-TensorFlow PyPI package.

  • Added overlay function habana_imagenet_dataset(...) to images processing utilities in Model References repository. It automatically fallbacks to default dataset provider now and will be used for new features in next releases.

  • load_habana_module() is no longer required to be called before importing habana-horovod Python package. There was an implicit order of initialization enforced on habana-horovod which required the user to call load_habana_module() before importing habana-horovod in Python scripts. This requirement has been removed.

  • Open MPI version has been upgraded to 4.1.2.

  • Added flag TF_BF16_DUMP_PATH to dump BF16 config file.

  • References to custom demo scripts were replaced by community entry points in Model-References’ READMEs. For more details, see Model References page.

PyTorch

  • Upgraded PyTorch to v1.10.2 and PyTorch lightning to v1.5.10.

  • Open MPI version has been upgraded to 4.1.2

  • Added troubleshooting instructions for runtime errors. For more details, see Troubleshooting your Model.

  • Default execution mode has been changed to Lazy mode instead of Eager mode.

  • Added Python utilities to torch_hpu package for HPU tensor types . For more details, see Python Package (torch_hpu).

  • Habana Data Loader supports SSD coco dataset with increased performance by 50% on x8 devices over the native PT data loader. For more details, see PyTorch SSD.

  • Enabled the following:

    • Wiki and Book corpus Packed dataset for BERT Pretraining

    • Visual Transformer (ViT) on x8 cards

    • GPT2 model on x8 cards

Known Issues and Limitations - 1.4.0

TensorFlow

  • When using TF dataset cache feature where the dataset size is large, setting hugepage for host memory may be required. Refer to SSD_ResNet34 Model Reference for instructions on setting hugepage.

  • Users need to convert models to TensorFlow2 if they are currently based on TensorFlow V1. TF XLA compiler option is currently not supported.

  • Control flow ops such as tf.cond and tf.while_loop are currently not supported on Gaudi and will fall back on CPU for execution.

  • DenseNet: Sporadic issues with training on 8 Gaudis may occur.

  • Eager Mode feature in TensorFlow2 is not supported and must be disabled to run TensorFlow models on Gaudi. To disable Eager mode, see Creating TensorFlow Example.

  • Distributed training with tf.distribute is enabled only with HPUStrategy. Other TensorFlow built-in distribution strategies such as MirroredStrategy, MultiWorkerMirroredStrategy, CentralStorageStrategy, ParameterServerStrategy are not supported.

  • EFA installation on Habana’s containers includes OpenMPI 4.1.2 which does not recognize the CPU cores and threads properly in a KVM virtualized environment. To enable identifying CPU/Threads configuration, replace mpirun with mpirun --bind-to hwthread --map-by hwthread:PE=3. This limitation is not applicable for AWS DL1 instances.

PyTorch

  • Convolution weight ordering in vision models is required as Gaudi HW performs convolution operations with weights stored in filters-last (RSCK) format which is different from the filters-first (KCRS) format used by PyTorch. Users will need to manually handle this by following the guidelines in the Convolution Weight Ordering in PyTorch Habana Vision Topologies. This will be improved in subsequent releases.

  • Dynamic shapes are not supported and will be enabled in future releases.

  • For Transformer models, time to train is high due to evaluation phase.

  • The current version of PyTorch Lighting included in this release is based on PyTorch Lightning version 1.5.10, but does contain the patch (Pull Request 11099) for the vulnerability reported in the National Vulnerability Database (CVE-2020-1747). Habana will make a decision in the future to update the version of PyTorch Lightning when the fix is available in the official upstream release.

  • Graphs displayed in TensorBoard have some minor limitations, eg. operator’s assigned device is displayed as “unknown device” when it is scheduled to HPU.

  • ZeroRedundancyOptimizer option of parameters_as_bucket_view is currently not supported and should be disabled for proper functionality.

  • EFA installation on Habana’s containers includes OpenMPI 4.1.2 which does not recognize the CPU cores and threads properly in a KVM virtualized environment. To enable identifying CPU/Threads configuration, replace mpirun with mpirun --bind-to hwthread --map-by hwthread:PE=3. This limitation is not applicable for AWS DL1 instances.

  • Habana data loader has an accuracy issue preventing convergence to SOTA with coco dataset in SSD. This will be resolved in future release.

  • Unet2D and Unet3D scripts in Model-Reference repository are not supported using PyTorchLightning v1.6.0

Habana Communication Library

  • Single Process Multiple Device Support in HCCL: Since multiple processes are required for multi-node (cross chassis) scaling, it only supports one device per process mode so that users do not need to differentiate the inter-node and intra-node usage cases.

  • COMM group support in HCCL: Each worker can be assigned, at most, to a single comm group.

Habana Qualification Library

  • Serdes Loopback test is not supported due to an internal known SW limitation. It is recommended to run Serdes Base test which supports the same functionality.