Importing PyTorch Models Manually
On this Page
Importing PyTorch Models Manually¶
import habana_frameworks.torch.core as htcore
Target the Gaudi HPU device:
device = torch.device("hpu")
mark_step(). In Lazy mode,
mark_step()must be added in all training scripts right after
If the model has dependencies on GPU libraries, refer to GPU Migration Toolkit.
Enabling Mixed Precision¶
To run mixed precision training on HPU without extensive modifications to existing FP32 model scripts, Intel Gaudi provides native PyTorch autocast support.
Autocast is a native PyTorch module that allows running mixed precision training.
It executes operations registered to autocast using lower precision floating datatype.
The module is provided using the
To use autocast on HPU, wrap the forward pass (model+loss) of the training to
with torch.autocast(device_type="hpu", dtype=torch.bfloat16):
output = model(input)
loss = loss_fn(output, target)
For further information such as the full default list of registered ops or instructions on creating a custom ops list, see Mixed Precision Training with PyTorch Autocast.
Setting Up Distributed Training¶
Intel Gaudi support for distributed communication can be enabled using HCCL (Habana Collective Communication Library) backend. Support for HCCL communication backend is loaded and process group communication backend is initialized as “hccl” using the following script changes:
In the example above, it is assumed either torchrun or mpirun was used to start training
and all necessary environment variables are set before
For further details on distributed training, see Distributed Training with PyTorch.