Distributed Training with PyTorch

Distributed training is becoming more and more important with the growth of training data size and model complexity. Many methods have been proposed by industry and academia like data parallelism, model parallelism, hybrid parallelism and more.

This document describes how to implement synchronous data parallel distributed training systems wherein the training process is accelerated by concurrent processing of the distributed dataset on multiple workers (one worker is attributed to one Intel® Gaudi® AI accelerator).