GPU Migration Toolkit

The GPU Migration Toolkit simplifies migrating PyTorch models that run on GPU-based architecture to run on Intel® Gaudi® AI accelerator. Rather than manually replacing Python API calls that have dependencies on GPU libraries with Gaudi-specific API calls, the toolkit automates this process so you can run your model with fewer modifications.

The GPU Migration toolkit maps specific API calls from the Python libraries and modules listed below to the appropriate equivalents in the Intel Gaudi software:

  • torch.cuda

  • Torch API with GPU related parameters. For example, torch.randn(device=”cuda”).

  • Apex. If a specific model requires Apex, see Limitations section for further instructions.

  • pynvml

The toolkit does not optimize the performance of the model, so further modifications may be required. For more details, refer to Model Performance Optimization Guide. This document describes a subset of supported Python APIs and provides instructions on how to automatically migrate PyTorch models using these calls.

Using GPU Migration Toolkit

GPU Migration toolkit is pre-installed as a Python package in the Intel Gaudi software. To migrate your model from GPU to HPU, perform the following steps:

  1. Prepare the environment for initial setup by following the steps in the Installation Guide and On-Premise System Update.

    Note: It is recommended to use the Intel Gaudi Pytorch Docker images and ensure that the existing packages in the models’ requirements.txt do not override the Intel Gaudi PyTorch module torch, and PyTorch Lightning as these packages contain Gaudi specific enhancements. Additionally, torchaudio and torchvision are validated on Gaudi and included in the docker image. Other PyTorch libraries have not been formally validated.

  2. Import the GPU Migration Package and habana_frameworks.torch.core at the beginning of the primary python script (,, etc.):

import habana_frameworks.torch.gpu_migration
import habana_frameworks.torch.core as htcore
  1. Add mark_step(). In Lazy mode, mark_step() must be added in all training scripts right after loss.backward() and optimizer.step().

  1. Make sure that any device selection argument passed to the script is configured as if the script is running on a GPU. For example, add --cuda or --device gpu in the runtime command of your model. This will guarantee that the GPU Migration tool accurately detects and migrates instructions.

You are now prepared to begin your model training on HPU.

It is highly recommended to review the GPU Migration examples in the Intel Gaudi Model References GitHub repository. These show the GPU Migration toolkit working on publicly available models, including the Log files and additional information needed for performance tuning:

Additional Model Considerations

For other libraries and model packages, please consider the following:

  • For DeepSpeed models, be sure to continue to use the Intel Gaudi DeepSpeed forked version of the DeepSpeed library as well as setting the deepspeed.init_distributed(), dist_backend to HCCL: deepspeed.init_distributed(dist_backend='hccl', init_method = <init_method>)

  • For Hugging Face models, it is recommended to simply use the existing Optimum-Habana interface for training and inference. You can refer to the Gaudi-specific examples or use additional models from the Hugging Face library. In some cases, the GPU Migration toolkit may help in identifying structures that may need to be modified.

  • For PyTorch Lightning, users should follow the existing methods used to to migrate models to work with PyTorch Lighting.

  • For Fairseq models, users should start with Intel Gaudi’s Fairseq fork from here.

Enabling GPU Migration Logging

You can enable the Logging feature, included in the GPU Migration Toolkit, by setting the GPU_MIGRATION_LOG_LEVEL environment variable as described in the table below. This generates log files that provide insight into the automation enabled by the GPU Migration Toolkit while running the model.




Logs all modules and prints to the console.


Logs all modules.


Logs all modules excluding torch.

Using the MNIST Example, you can add the Logging feature as follows:


The log files are stored under $HABANA_LOGS/gpu_migration_logs/. Sample log files for the example models listed above can be found in their corresponding directory.

The Logging feature allows you to identify GPU calls that mismatch with HPU and may have not been implemented. If you encounter such a scenario, implement the necessary changes to the model script manually, based on the information provided in the log file:

  • If you modify only a few unimplemented calls, you can then directly rerun the training process.

  • If you modify all the unimplemented calls, remove the import habana_frameworks.torch.gpu_migration line of code and restart the training process without GPU Migration toolkit.

GPU Migration APIs Support

The Python APIs supported by the GPU Migration toolkit are classified as hpu_match, hpu_modified and hpu_mismatch according to HPU implementation. To access the full list of these calls and their compatibility with Gaudi, refer to Intel Gaudi GPU Migration APIs guide.

Support Matrix

The below matrix contains information on versions of libraries verified with the GPU Migration toolkit:


Verified version

Additional notes



See Limitations section for details on installation.






  • All the libraries and modules are preinstalled in Gaudi containers except for the Apex library. If a specific model requires Apex, run the following command to install it:

git clone && cd apex
# Required for installation without GPU dependencies and build isolation
git fetch origin pull/1610/head:bug_fix &&  git cherry-pick b559175
git fetch origin pull/1680/head:dependency_fix && git cherry-pick --allow-empty --no-commit 2944255 --strategy-option=theirs
pip install -v --disable-pip-version-check --no-cache-dir ./
  • GPU Migration does not migrate calls to third party programs such as nvcc or nvidia-smi . Those calls need to be migrated manually.

  • Intel Gaudi prefers usage of Bfloat16 over Float16 data type for models training/inference. To enable automatic conversion from Float16 to Bfloat16 data type, use PT_HPU_CONVERT_FP16_TO_BF16_FOR_MIGRATION=1 flag as shown below. By default, PT_HPU_CONVERT_FP16_TO_BF16_FOR_MIGRATION=0 uses the declared data type: