Gaudi-to-process Assignment

For PyTorch distributed to work correctly, you need to configure mapping between processes and Intel® Gaudi® AI accelerators. To do so, call initialize_distributed_hpu() in a script. For reference see: Intel Gaudi PyTorch Python API (habana_frameworks.torch).

There are two modes of device assignment:

  • Automatic Assignment

  • Manual Assignment

Automatic Assignment

In distributed training scenarios, Pytorch HCCL backend ensures that each Gaudi assigned to each process has a Module ID equal to the local rank of that process. The local rank value is established using the OMPI_COMM_WORLD_LOCAL_RANK or LOCAL_RANK environment variables, which should be set in an MPI or PyTorch compliant runtime environment.

If neither OMPI_COMM_WORLD_LOCAL_RANK nor LOCAL_RANK value is set, every process will allocate the first device that is not busy.

Manual Assignment

You can manually assign a Gaudi with a given Module ID to the particular process by setting the HABANA_VISIBLE_MODULES environment variable.

Flags

Value Format

Description

HLS_MODULE_ID

Single integer value from range [0,N-1], where N is number of Gaudi devices available in the server.

The process will always try to acquire Gaudi with Module ID equal to given value.

This variable must have a different value for every process.

HABANA_VISIBLE_MODULES

Comma separated list of integers from range [0,N-1], where N is the number of Gaudi devices available in the server.

In scale-out scenarios, the number of elements in the list must be equal to the number of Gaudis in the server.

In scale-up scenarios, the number of elements in the list must be equal to the number of devices used in training.

Each value on the list should be unique.

Each process will try to acquire Module ID from the list with position corresponding to its local rank number.

This variable must have the same value for every process.