Release Notes v1.15
On this Page
Release Notes v1.15¶
New Features and Enhancements - 1.15.1¶
The following documentation and packages correspond to the latest software release version from Intel® Gaudi®: 1.15.1-15. We recommend using the latest release where possible to stay aligned with performance improvements and updated model coverage. Please refer to the Installation Guide for further details.
This release includes various bug fixes.
New Features and Enhancements - 1.15.0¶
The following documentation and packages correspond to the latest software release version from Intel® Gaudi®: 1.15.0-479. We recommend using the latest release where possible to stay aligned with performance improvements and updated model coverage. Please refer to the Installation Guide for further details.
Important Note: With the acquisition of Habana® Labs by Intel, the SynapseAI® brand has changed to Intel Gaudi software. To make this transition as easy as possible for developers and customers, many of the brand references in developer literature remain unchanged. Specifically, no code has changed. You will see continued reference to Synapse in the code and code examples.
General¶
Added support for Kubernetes versions 1.27, 1.28 and 1.29.
Upgraded OpenShift version to 4.14.
Added RHEL 9.2 support for Gaudi 2 only. Dropped RHEL8 for Gaudi 2 and first-gen Gaudi.
Added CRI-O support to register the
habana
runtime. See Installation Guide.
PyTorch¶
Improved performance for the following Gaudi 2 models. See Models Performance page:
LLaMA 2 70B BF16 pre-training
LLaMA 2 7B/70B BF16/FP8 for inference
Mixtral 7B for inference
Added further improvements of Text Generation Inference (TGI) support for Gaudi 2. For more details, see https://github.com/huggingface/tgi-gaudi.
Added PyTorch Fully Sharded Data Parallel (FSDP) support. FSDP runs distributed training on large-scale models while reducing memory footprint. See Using FSDP with Intel Gaudi. FSDP is enabled on LLaMA 2 70B.
Moved Megatron-DeepSpeed LLaMa 13B and LLaMa 2 70B models from Model References to Intel Gaudi Megatron-DeepSpeed repository.
Upgraded to PyTorch version 2.2.0. See PyTorch Support Matrix.
Validated the Intel Gaudi 1.15.0 software release on PyTorch Lightning version 2.2.0. See https://lightning.ai/docs/pytorch/stable/integrations/hpu/.
1.15.0 release has been validated with Hugging Face Optimum-Habana library and model version 1.10.4. Future releases of the Optimum-Habana library may be validated with this release. Please check the Support Matrix for a full list of version support.
Removed the following models from Model References GitHub repository:
ResNet50 PyTorch Lightning
Megatron-DeepSpeed BLOOM 13B
For models using
torch.compile
,aot_hpu_inference_backend
is now deprecated and will be removed in a future release. Replaceaot_hpu_inference_backend
withhpu_backend
.
Firmware¶
Added an argument to the fw-loader CLI to enforce specifying device types. Devices not matching the type on PCI/I2C will be ignored.
TensorFlow¶
TensorFlow is no longer supported.
Removed all TensorFlow models from Model References GitHub repository.
Known Issues and Limitations - 1.15.0¶
PyTorch¶
High host RAM memory utilization may be encountered when using the Quantization Toolkit (HQT) to convert models to FP8. This will be fixed in a future release.
To bypass a performance issue in Linux kernel version >= 5.9 (e.g. Ubuntu 22.04), the
intel_idle
driver must be disabled by addingintel_idle.max_cstate=0
to the kernel command line.Support for
torch.compile
is in early stage. Models may not work (due to missing OPs implementation) or performance may be affected.Support for Eager mode is in early stages. Models may not work (due to missing OPs implementation) or performance may be affected. The functionality of Eager mode as a subset of Lazy mode can be emulated by using
PT_HPU_MAX_COMPOUND_OP_SIZE
environment variable and limiting cluster sizes to 1. See Eager Mode.Model checkpointing for ResNet50 in
torch.compile
mode is broken. This will be fixed in the next release.Timing events where
enable_timing=True
may not provide accurate timing information.Handling Dynamic shapes can be initiated by setting the
PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES
flag. Flag is disabled by default but enabled selectively for several models. For best performance, users should follow the guidance on how to work with Dynamic Shapes in the Handling Dynamic Shapes document.Graphs displayed in TensorBoard have some minor limitations, eg. operator’s assigned device is displayed as “unknown device” when it is scheduled to HPU.
HPU tensor strides might not match that of CPU as tensor storage is managed differently. Reference to tensor storage (such as torch.as_strided) should take into account the input tensor strides explicitly. It is recommended to use other view functions instead of torch.as_strided. For further details, see Tensor Views and TORCH.AS_STRIDED.
Weights sharing:
Weights can be shared among two or more layers using PyTorch with Gaudi only if they are created inside the module. For more details, refer to Weight Sharing.
Weights are not shared with operators outside of the PyTorch library (i.e. PyBind11 functions).
User-defined attributes in HPU
torch.nn.Parameter
are not preserved aftertorch.nn.Parameter
is assigned with a CPU tensor.Python API
habana_frameworks.torch.hpu.current_device()
returns 0 regardless of the actual device being used.For
torch.nn.Parameter
which is not created insidetorch.nn.Module
:When two
torch.nn.Parameter
are on CPU storage and referencing the same parameter, the connection will be lost if one of them is moved to HPU.Assigning a CPU tensor to HPU
torch.nn.Parameter
is not supported.
Saving metrics to a file configured using Runtime Environment Variables is not supported for workloads spawned via
torch.multiprocessing
.Using torch.device(hpu:x) - (for example, as model.to) - where x is rank > 0 may lead to memory leaks. Instead, always use torch.device(hpu) to access the current rank.