DeepSpeed User Guide

The purpose of this document is to guide Data Scientists to run PyTorch models on the Habana® Gaudi® infrastructure using a DeepSpeed interface.

DeepSpeed Gaudi Integration

Existing model training scripts can be migrated to use DeepSpeed library and integrate new optimizations into the training process. Refer to for further details.

  • Zero Redundancy optimizers

  • Usage in lower precision data types as Bfloat16

  • Checkpoint Activation

  • Model Pipelining parallelism

The HabanaAI GitHub shares a fork of the DeepSpeed library that includes changes to add support for Gaudi. To use DeepSpeed with Gaudi, you must install Habana’s fork for DeepSpeed:

  • By installing directly from the DeepSpeed fork repository located in HabanaAI GitHub:

pip install git+
  • Or, by cloning the DeepSpeed fork repository and installing from local directory:

git clone git+

cd DeepSpeed

pip install

DeepSpeed Validated Configurations

The following DeepSpeed configurations have been validated to be fully functioning with Gaudi:

  • Distributed Data Parallel (multi-card)

  • Zero1

  • BF16 precision


DeepSpeed’s multi-node training uses pdsh for invoking the processes on remote hosts. Make sure it is installed on your machine before using it.

Working with DeepSpeed on Gaudi

  • If you have an existing training script that runs on Gaudi, it can be converted into DeepSpeed by following the instructions in

  • In order to align both training script and DeepSpeed to use Gaudi, it is highly recommended to use only the dedicated DeepSpeed flag use_hpu inside the training script.

  • You can provide a DeepSpeed json config file to specify a collection of training settings. See

It is highly recommended to review one of the examples located under DeepSpeed Examples.