On this Page
For PyTorch distributed to work correctly, you need to configure mapping between processes and Intel® Gaudi® AI accelerators. To do so,
initialize_distributed_hpu() in a script. For reference see: Intel Gaudi PyTorch Python API (habana_frameworks.torch).
There are two modes of device assignment:
In distributed training scenarios, Pytorch HCCL backend ensures that each Gaudi assigned to each process has a
Module ID equal to the local rank of that process.
The local rank value is established using the
LOCAL_RANK environment variables,
which should be set in an MPI or PyTorch compliant runtime environment.
LOCAL_RANK value is set, every process will allocate the first
device that is not busy.
You can manually assign a Gaudi with a given Module ID to the particular process by setting the
HABANA_VISIBLE_MODULES environment variable.
Single integer value from range [0,N-1], where N is number of Gaudi devices available in the server.
The process will always try to acquire Gaudi with Module ID equal to given value.
This variable must have a different value for every process.
Comma separated list of integers from range [0,N-1], where N is the number of Gaudi devices available in the server.
In scale-out scenarios, the number of elements in the list must be equal to the number of Gaudis in the server.
In scale-up scenarios, the number of elements in the list must be equal to the number of devices used in training.
Each value on the list should be unique.
Each process will try to acquire Module ID from the list with position corresponding to its local rank number.
This variable must have the same value for every process.