Use Intel Gaudi Containers

You can either pull prebuilt containers as described below or build custom Docker images as detailed in the Setup and Install Repo.

Prebuilt containers are provided in the Intel Gaudi vault. Use the below commands to pull and run Dockers from Intel Gaudi vault.

  1. Pull Docker using the following command:

       docker pull vault.habana.ai/gaudi-docker/1.18.0/ubuntu24.04/habanalabs/pytorch-installer-2.4.0:latest
    
       docker pull vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
    
       docker pull vault.habana.ai/gaudi-docker/1.18.0/amzn2/habanalabs/pytorch-installer-2.4.0:latest
    
       docker pull vault.habana.ai/gaudi-docker/1.18.0/rhel8.6/habanalabs/pytorch-installer-2.4.0:latest
    
       docker pull vault.habana.ai/gaudi-docker/1.18.0/rhel9.2/habanalabs/pytorch-installer-2.4.0:latest
    
       docker pull vault.habana.ai/gaudi-docker/1.18.0/rhel9.4/habanalabs/pytorch-installer-2.4.0:latest
    
       docker pull vault.habana.ai/gaudi-docker/1.18.0/tencentos3.1/habanalabs/pytorch-installer-2.4.0:latest
    
       docker pull vault.habana.ai/gaudi-docker/1.18.0/suse15.5/habanalabs/pytorch-installer-2.4.0:latest
    
  2. Run Docker. Make sure to include --ipc=host. This is required for distributed training using the Habana Collective Communication Library (HCCL), allowing re-use of host shared memory for best performance:

       docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -v /opt/datasets:/datasets --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.18.0/ubuntu24.04/habanalabs/pytorch-installer-2.4.0:latest
    
       docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -v /opt/datasets:/datasets --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
    
       docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -v /opt/datasets:/datasets --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.18.0/amzn2/habanalabs/pytorch-installer-2.4.0:latest
    
       docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -v /opt/datasets:/datasets --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.18.0/rhel8.6/habanalabs/pytorch-installer-2.4.0:latest
    
       docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -v /opt/datasets:/datasets --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.18.0/rhel9.2/habanalabs/pytorch-installer-2.4.0:latest
    
       docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -v /opt/datasets:/datasets --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.18.0/rhel9.4/habanalabs/pytorch-installer-2.4.0:latest
    
       docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -v /opt/datasets:/datasets --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.18.0/tencentos3.1/habanalabs/pytorch-installer-2.4.0:latest
    
       docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -v /opt/datasets:/datasets --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.18.0/suse15.5/habanalabs/pytorch-installer-2.4.0:latest
    

    Note

    • Please note that starting from 1.18.0 release, SSH host keys have been removed from Dockers. To add them, make sure to run /usr/bin/ssh-keygen -A inside the Docker container. If you are running on Kubernetes, make sure the SSH host keys are identical across all Docker containers. To achieve this, you can either build a new Docker image on top of Intel Gaudi Docker image by adding a new layer RUN /usr/bin/ssh-keygen -A, or externally mount the SSH host keys.

    • To run the Docker image with a partial number of the supplied Gaudi devices, make sure to set the device to module mapping correctly. See Multiple Dockers Each with a Single Workload for further details.

    • You can also use prebuilt containers provided in Amazon ECR Public Library and AWS Available Deep Learning Containers Images.