Optimizing Training Using PyTorch Lightning

HPUParallelStrategy provided by PyTorch Lightning package supports features such setting size of gradient bucket, setting gradients view of allreduce buckets and static_graph.

By setting static_graph when instantiating the Trainer, allreduce on unused parameters in the graph can be avoided. This also avoids overhead of copying them from host to device and vice versa after performing the allreduce.

Example:

Refer to the implementation for Unet2D.