Gaudi-to-process Assignment

For PyTorch distributed to work correctly, you need to configure mapping between processes and Gaudi devices. To do so, call initialize_distributed_hpu() in a script. For reference see: Habana PyTorch Python API (habana_frameworks.torch).

There are two modes of device assignment:

  • Automatic Assignment

  • Manual Assignment

Automatic Assignment

In distributed training scenarios, Pytorch HCCL backend ensures that each Gaudi assigned to each process has a Module ID equal to the local rank of that process. The local rank value is established using the OMPI_COMM_WORLD_LOCAL_RANK or LOCAL_RANK environment variables, which should be set in an MPI or PyTorch compliant runtime environment.

If neither OMPI_COMM_WORLD_LOCAL_RANK nor LOCAL_RANK value is set, every process will allocate the first device that is not busy.

Manual Assignment

You can manually assign a Gaudi with a given Module ID to the particular process by setting the HABANA_VISIBLE_MODULES environment variable.


Value Format



Comma separated list of integers from range [0,N-1], where N is the number of Gaudi devices available in the server.

In scale-out scenarios, the number of elements in the list must be equal to the number of Gaudis in the server.

In scale-up scenarios, the number of elements in the list must be equal to the number of devices used in training.

Each value on the list should be unique.

Each process will try to acquire Module ID from the list with position corresponding to its local rank number.

This variable must have the same value for every process.