GPU Migration Toolkit

The GPU Migration toolkit simplifies migrating PyTorch models that run on GPU-based architecture to run on Intel® Gaudi® AI accelerator. Rather than manually replacing Python API calls that have dependencies on GPU libraries with Gaudi-specific API calls, the toolkit automates this process so you can run your model with fewer modifications.

The GPU Migration toolkit maps specific API calls from the Python libraries and modules listed below to the appropriate equivalents in the Intel Gaudi software:

  • torch.cuda

  • Torch API with GPU related parameters. For example, torch.randn(device=”cuda”).

  • Apex. If a specific model requires Apex, see Limitations section for further instructions.

  • pynvml

The toolkit does not optimize the performance of the model, so further modifications may be required. For more details, refer to Model Performance Optimization Guide.

Enabling the GPU Migration Toolkit

The GPU Migration toolkit is preinstalled as a Python package in the Intel Gaudi software. To migrate your model from GPU to HPU, perform the following steps:

  1. Prepare the environment for initial setup by following the steps in the Installation Guide.

    Note: It is recommended to use the Intel Gaudi PyTorch Docker images and ensure that the existing packages in the models’ requirements.txt do not override the Intel Gaudi PyTorch module, torch, and PyTorch Lightning as these packages contain Gaudi-specific enhancements. Additionally, torchaudio and torchvision are validated on Gaudi and included in the Docker image. Other PyTorch libraries have not been formally validated.

  2. Import the GPU Migration package and habana_frameworks.torch.core at the beginning of the primary Python script (,, etc.):

    import habana_frameworks.torch.gpu_migration
    import habana_frameworks.torch.core as htcore
  3. Add mark_step(). In Lazy mode, mark_step() must be added in all training scripts right after loss.backward() and optimizer.step().

  4. Make sure that any device selection argument passed to the script is configured as if the script is running on a GPU. For example, add --cuda or --device gpu in the runtime command of your model. This will guarantee that the GPU Migration toolkit accurately detects and migrates instructions.

You are now prepared to begin your model training on HPU.

It is highly recommended to review the GPU Migration examples in the Intel Gaudi Model References GitHub repository. These show the GPU Migration toolkit working on publicly available models, including the log files and additional information needed for performance tuning: MNIST, BERT, ResNet50, and Stable Diffusion.

Additional Model Considerations

For other libraries and model packages, please consider the following:

  • For DeepSpeed models, be sure to continue to use the Intel Gaudi DeepSpeed forked version of the DeepSpeed library as well as setting the distribution backend to HCCL: deepspeed.init_distributed(dist_backend='hccl', init_method = <init_method>).

  • For Hugging Face models, it is recommended to simply use the existing Optimum Habana interface for training and inference. You can refer to the Gaudi-specific examples or use additional models from the Hugging Face library. In some cases, the GPU Migration toolkit may help in identifying structures that may need to be modified.

  • For PyTorch Lightning, follow the existing methods used to to migrate models to work with PyTorch Lighting.

  • For Fairseq models, start with Intel Gaudi’s Fairseq fork.

Enabling GPU Migration Logging

You can enable the logging feature, included in the GPU Migration toolkit, by setting the GPU_MIGRATION_LOG_LEVEL environment variable as described in the table below. This generates log files that provide insight into the automation enabled by the GPU Migration toolkit while running the model.




Logs all modules and prints to the console.


Logs all modules.


Logs all modules excluding torch.

Using this MNIST Example, you can add the logging environment variable to the run command as follows:


The log files are stored under $HABANA_LOGS/gpu_migration_logs/. Sample log files for ResNet50, and Stable Diffusion can be found in their corresponding directory.

The logging feature allows you to identify GPU calls that mismatch with HPU and may have not been implemented. If you encounter such a scenario, implement the necessary changes to the model script manually, based on the information provided in the log file:

  • If you modify a few unimplemented calls, rerun the training process.

  • If you modify all unimplemented calls, remove the import habana_frameworks.torch.gpu_migration line of code and restart the training process without the GPU Migration toolkit.

GPU Migration APIs Support

The Python APIs supported by the GPU Migration toolkit are classified as hpu_match, hpu_modified and hpu_mismatch according to HPU implementation. To access the full list of these calls and their compatibility with Gaudi, refer to the Intel Gaudi GPU Migration Toolkit APIs guide.

Support Matrix

The below support matrix lists the versions of libraries verified with the GPU Migration toolkit:


Verified version

Additional notes



See Limitations section for details on installation.




2.2.0, 2.2.1, 2.2.2


  • All the libraries and modules are preinstalled in Gaudi containers except for the Apex library. If a specific model requires Apex, run the following command to install it:

    git clone && cd apex
    # Required for installation without GPU dependencies and build isolation
    git fetch origin pull/1610/head:bug_fix &&  git cherry-pick b559175
    git fetch origin pull/1680/head:dependency_fix && git cherry-pick --allow-empty --no-commit 2944255 --strategy-option=theirs
    pip install -v --disable-pip-version-check --no-cache-dir ./
  • The GPU Migration toolkit does not migrate calls to third party programs such as nvcc or nvidia-smi . Those calls need to be migrated manually.

  • Intel Gaudi prefers usage of bfloat16 over float16 data type for models training/inference. To enable automatic conversion from float16 to bfloat16 data type, use PT_HPU_CONVERT_FP16_TO_BF16_FOR_MIGRATION=1 flag as shown below. By default, PT_HPU_CONVERT_FP16_TO_BF16_FOR_MIGRATION=0 uses the declared data type: