Distributed Backend Initialization
On this Page
Distributed Backend Initialization¶
PyTorch supports distributed communication using torch.distributed
and torch.nn.parallel.DistributedDataParallel
APIs for both data and model parallelism.
PyTorch supports a few communication backends such as MPI, Gloo and NCCL natively.
Intel® Gaudi® AI accelerator support for distributed communication can be enabled using Habana Collective Communication Library (HCCL) backend.
Device Mapping¶
The Intel Gaudi PyTorch bridge resolves device mapping automatically based on the environment variables set by the launcher - either torchrun
or mpirun
.
In more advanced scenarios where manual device-to-process mapping is required, see Gaudi-to-process Assignment.
HCCL Initialization¶
The following script loads the HCCL communication backend and initializes process group communication backend
as hccl
:
import habana_frameworks.torch.distributed.hccl
torch.distributed.init_process_group(backend='hccl')
The above example assumes training is launched using either torchrun
or mpirun
and all necessary environment variables are set before habana_frameworks.torch.distributed.hccl
import.
To add communication hooks such as DDP, refer to the PyTorch Model References GitHub page.
Setting Environment Variables for Custom Launchers¶
For custom launchers, use initialize_distributed_hpu(world_size, rank, local_rank)
to set
all environment variables necessary for training:
import habana_frameworks.torch.distributed.hccl as hccl
hccl.initialize_distributed_hpu(world_size=world_size, rank=rank, local_rank=local_rank)
torch.distributed.init_process_group(backend='hccl')
Alternatively, you can manually set the following environment variables before importing habana_frameworks.torch.distributed.hccl
:
os.environ['WORLD_SIZE'] = world_size
os.environ['RANK'] = rank
os.environ['LOCAL_RANK'] = local_rank
...
import habana_frameworks.torch.distributed.hccl
torch.distributed.init_process_group(backend='hccl')
To obtain the variable values set in the environment by the launcher, call initialize_distributed_hpu()
without any parameters:
import habana_frameworks.torch.distributed.hccl as hccl
world_size, rank, local_rank = hccl.initialize_distributed_hpu()