Docker Installation
On this Page
Docker Installation¶
Configure Container Runtime¶
To register the habana
runtime, use the method below that is best
suited to your environment. You might need to merge the new argument
with your existing configuration.
Note
As of Kubernetes 1.20, support for Docker has been deprecated.
Register
habana
runtime by adding the following to /etc/docker/daemon.json:sudo tee /etc/docker/daemon.json <<EOF { "runtimes": { "habana": { "path": "/usr/bin/habana-container-runtime", "runtimeArgs": [] } } } EOF
(Optional) Set the default runtime by adding the following to
/etc/docker/daemon.json
. Setting the default runtime ashabana
will route all your workloads through this runtime. However, any generic workloads will automatically be forwarded to a generic runtime. If you prefer not to set the default runtime, you can skip this step and override the runtime setting for the running container by using the--runtime
flag in thedocker run
command:"default-runtime": "habana"
Your
/etc/docker/daemon.json
should look similar to this:{ "default-runtime": "habana", "runtimes": { "habana": { "path": "/usr/bin/habana-container-runtime", "runtimeArgs": [] } } }
Restart Docker:
sudo systemctl restart docker
Register
habana
runtime:sudo tee /etc/containerd/config.toml <<EOF disabled_plugins = [] version = 2 [plugins] [plugins."io.containerd.grpc.v1.cri"] [plugins."io.containerd.grpc.v1.cri".containerd] default_runtime_name = "habana" [plugins."io.containerd.grpc.v1.cri".containerd.runtimes] [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.habana] runtime_type = "io.containerd.runc.v2" [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.habana.options] BinaryName = "/usr/bin/habana-container-runtime" [plugins."io.containerd.runtime.v1.linux"] runtime = "habana-container-runtime" EOF
Restart containerd:
sudo systemctl restart containerd
Create a new configuration file at
/etc/crio/crio.conf.d/99-habana-ai.conf
:[crio.runtime] default_runtime = "habana-ai" [crio.runtime.runtimes.habana-ai] runtime_path = "/usr/local/habana/bin/habana-container-runtime" monitor_env = [ "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", ]
Restart CRI-O service:
systemctl restart crio.service
.
Use Intel Gaudi Containers¶
You can either pull prebuilt containers as described below or build custom Docker images as detailed in the Setup and Install Repo.
Prebuilt containers are provided in the Intel Gaudi vault. Use the below commands to pull and run Dockers from Intel Gaudi vault.
Pull Docker using the following command:
docker pull vault.habana.ai/gaudi-docker/1.18.0/ubuntu24.04/habanalabs/pytorch-installer-2.4.0:latest
docker pull vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
docker pull vault.habana.ai/gaudi-docker/1.18.0/amzn2/habanalabs/pytorch-installer-2.4.0:latest
docker pull vault.habana.ai/gaudi-docker/1.18.0/rhel8.6/habanalabs/pytorch-installer-2.4.0:latest
docker pull vault.habana.ai/gaudi-docker/1.18.0/rhel9.2/habanalabs/pytorch-installer-2.4.0:latest
docker pull vault.habana.ai/gaudi-docker/1.18.0/rhel9.4/habanalabs/pytorch-installer-2.4.0:latest
docker pull vault.habana.ai/gaudi-docker/1.18.0/tencentos3.1/habanalabs/pytorch-installer-2.4.0:latest
docker pull vault.habana.ai/gaudi-docker/1.18.0/suse15.5/habanalabs/pytorch-installer-2.4.0:latest
Run Docker using the following command. Make sure to include
--ipc=host
. This is required for distributed training using the Habana Collective Communication Library (HCCL), allowing re-use of host shared memory for best performance:docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -v /opt/datasets:/datasets --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.18.0/ubuntu24.04/habanalabs/pytorch-installer-2.4.0:latest
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -v /opt/datasets:/datasets --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -v /opt/datasets:/datasets --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.18.0/amzn2/habanalabs/pytorch-installer-2.4.0:latest
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -v /opt/datasets:/datasets --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.18.0/rhel8.6/habanalabs/pytorch-installer-2.4.0:latest
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -v /opt/datasets:/datasets --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.18.0/rhel9.2/habanalabs/pytorch-installer-2.4.0:latest
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -v /opt/datasets:/datasets --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.18.0/rhel9.4/habanalabs/pytorch-installer-2.4.0:latest
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -v /opt/datasets:/datasets --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.18.0/tencentos3.1/habanalabs/pytorch-installer-2.4.0:latest
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -v /opt/datasets:/datasets --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.18.0/suse15.5/habanalabs/pytorch-installer-2.4.0:latest
Note
Please note that starting from 1.18.0 release, SSH host keys have been removed from Dockers. To add them, make sure to run
/usr/bin/ssh-keygen -A
inside the Docker container. If you are running on Kubernetes, make sure the SSH host keys are identical across all Docker containers. To achieve this, you can either build a new Docker image on top of Intel Gaudi Docker image by adding a new layerRUN /usr/bin/ssh-keygen -A
, or externally mount the SSH host keys.To run the Docker image with a partial number of the supplied Gaudi devices, make sure to set the device to module mapping correctly. See Multiple Dockers Each with a Single Workload for further details.
You can also use prebuilt containers provided in Amazon ECR Public Library and AWS Available Deep Learning Containers Images.