Gaudi-to-process Assignment
On this Page
Gaudi-to-process Assignment¶
For PyTorch distributed to work correctly, you need to configure mapping between processes and Intel® Gaudi® AI accelerators. To do so,
call initialize_distributed_hpu()
in a script as detailed in Intel Gaudi PyTorch Python API (habana_frameworks.torch).
There are two modes of device assignment:
Automatic Assignment
Manual Assignment
Automatic Assignment¶
In distributed training scenarios, PyTorch HCCL backend ensures that each Gaudi assigned to each process has a
Module ID equal to the local rank of that process.
The local rank value is established using the OMPI_COMM_WORLD_LOCAL_RANK
or LOCAL_RANK
environment variables,
which should be set in an MPI or PyTorch compliant runtime environment.
If neither OMPI_COMM_WORLD_LOCAL_RANK
nor LOCAL_RANK
value is set, every process will allocate the first
device that is not busy.
Manual Assignment¶
You can manually assign a Gaudi with a given Module ID to the particular process by setting the
HABANA_VISIBLE_MODULES
environment variable.
Flags |
Value Format |
Description |
---|---|---|
HLS_MODULE_ID |
Single integer value from range [0,N-1], where N is number of Gaudi devices available in the server. |
The process will always try to acquire Gaudi with Module ID equal to given value. This variable must have a different value for every process. |
HABANA_VISIBLE_MODULES |
Comma separated list of integers from range [0,N-1], where N is the number of Gaudi devices available in the server. In scale-out scenarios, the number of elements in the list must be equal to the number of Gaudis in the server. In scale-up scenarios, the number of elements in the list must be equal to the number of devices used in training. Each value on the list should be unique. |
Each process will try to acquire Module ID from the list with position corresponding to its local rank number. This variable must have the same value for every process. |