Getting Started with DeepSpeed
On this Page
Getting Started with DeepSpeed¶
This guide provides simple steps for preparing a DeepSpeed model to run on Gaudi. Make sure to install the DeepSpeed package provided by Habana. Installing public DeepSpeed packages is not supported.
To set up the environment, refer to the Installation Guide. The supported DeepSpeed versions are listed in the Support Matrix.
Start Training a DeepSpeed Model on Gaudi¶
Run the Habana Docker image:
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.11.0/ubuntu20.04/habanalabs/pytorch-installer-2.0.1:latest
Install Habana’s DeepSpeed fork:
pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.11.0
Clone the Model References repository inside the container that you have just started:
git clone https://github.com/HabanaAI/Model-References.git
Move to the subdirectory containing the cifar_example:
cd Model-References/PyTorch/examples/DeepSpeed/cifar_example/
Note
The model defined in the cifar10_Deepspeed.py script script is a simple CNN based model which loads the CIFAR-10 dataset automatically.
Install the associated requirements:
pip install -r requirements.txt
Update PYTHONPATH to include Model-References repository and set PYTHON to python executable:
export PYTHONPATH=$PYTHONPATH:Model-References
export PYTHON=/usr/bin/python3.8
Execute the
run_ds_habanax8.sh
script. If you are running on a single Gaudi, modify the script to set--num_gpus=1
.
deepspeed --num_nodes=1 --num_gpus=8 cifar10_deepspeed.py --deepspeed --deepspeed_config ds_config.json --use_hpu
The following should appear as part of the output:
[10, 2000] loss: 0.776
[10, 2000] loss: 0.760
[10, 2000] loss: 0.747
[10, 2000] loss: 0.753
[10, 2000] loss: 0.759
[10, 2000] loss: 0.776
[10, 2000] loss: 0.772
[10, 2000] loss: 0.776
Finished Training
GroundTruth: cat ship ship plane
Predicted: cat ship ship plane
Accuracy of the network on the 10000 test images: 59 %
Accuracy of ship : 70 %
Accuracy of truck : 57 %
[2022-10-28 17:17:55,740] [INFO] [launch.py:212:main] Process 815 exits successfully.
[2022-10-28 17:17:55,741] [INFO] [launch.py:212:main] Process 818 exits successfully.
[2022-10-28 17:17:55,741] [INFO] [launch.py:212:main] Process 820 exits successfully.
[2022-10-28 17:17:55,741] [INFO] [launch.py:212:main] Process 814 exits successfully.
[2022-10-28 17:17:55,741] [INFO] [launch.py:212:main] Process 817 exits successfully.
[2022-10-28 17:17:55,741] [INFO] [launch.py:212:main] Process 819 exits successfully.
[2022-10-28 17:17:55,741] [INFO] [launch.py:212:main] Process 816 exits successfully.
[2022-10-28 17:17:56,742] [INFO] [launch.py:212:main] Process 813 exits successfully.
To start training your own DeepSpeed models on Gaudi, refer to DeepSpeed User Guide for Training.