Importing PyTorch Models Manually
On this Page
Importing PyTorch Models Manually¶
Importing habana_frameworks.torch.core
¶
Import
habana_frameworks.torch.core
:import habana_frameworks.torch.core as htcore
Target the Gaudi device:
device = torch.device("hpu")
Add
mark_step()
. In Lazy mode,mark_step()
must be added in all training scripts right afterloss.backward()
andoptimizer.step()
.htcore.mark_step()
Note
If the model has dependencies on GPU libraries, refer to GPU Migration Toolkit.
Enabling Mixed Precision¶
To run mixed precision training on HPU without extensive modifications to existing FP32 model scripts, Intel Gaudi provides native PyTorch autocast support.
Autocast is a native PyTorch module that allows running mixed precision training.
It executes operations registered to autocast using lower precision floating data type.
The module is provided using the torch.amp
package.
To use autocast on HPU, wrap the forward pass (model+loss) of the training to torch.autocast
:
with torch.autocast(device_type="hpu", dtype=torch.bfloat16):
output = model(input)
loss = loss_fn(output, target)
loss.backward()
For further information such as the full default list of registered ops or instructions on creating a custom ops list, see Mixed Precision Training with PyTorch Autocast.
Setting Up Distributed Training¶
Intel Gaudi support for distributed communication can be enabled using Habana Collective Communication Library (HCCL) backend.
Support for HCCL communication backend is loaded and process group communication backend
is initialized as hccl
using the following script changes:
import habana_frameworks.torch.distributed.hccl
torch.distributed.init_process_group(backend='hccl')
In the example above, it is assumed either torchrun
or mpirun
was used to start training
and all necessary environment variables are set before habana_frameworks.torch.distributed.hccl
import.
For further details on distributed training, see Distributed Training with PyTorch.