Habana Deep Learning Base AMI Installation¶

The following table outlines the supported installation options and the steps required. Habana’s Base AMI is delivered preinstalled with the necessary installation to run containers.

Objective	Steps
Run Using Containers on Habana Base AMI (Recommended)	Pull Prebuilt Containers or Build Docker Images from Intel Gaudi Dockerfiles Run models using Intel Gaudi Model References GitHub repository
Run PyTorch on Habana Base AMI	Install PyTorch Set up Python for Models Run models using Intel Gaudi Model References GitHub repository

Habana Deep Learning AMI also includes AMIs on Amazon ECS and Amazon EKS. See Amazon ECS with Gaudi User Guide and Amazon EKS with Gaudi User Guide for more details.

Note

Before installing the below packages and Dockers, make sure to review the currently supported versions and operating systems listed in the Support Matrix.

Run Using Containers on Habana Base AMI¶

Pull Prebuilt Containers¶

Prebuilt containers are provided in:

Intel Gaudi vault
Amazon ECR Public Library
AWS Deep Learning Containers (DLC)

Pull and Launch Docker Image - Intel Gaudi Vault¶

Follow the steps below while running on Ubuntu 22.04.5.

Use the below command to pull Docker:

    docker pull vault.habana.ai/gaudi-docker/1.21.2/ubuntu22.04/habanalabs/pytorch-installer-2.6.0:latest

Use the below command to run Docker:

    docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -v /opt/datasets:/datasets --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.21.2/ubuntu22.04/habanalabs/pytorch-installer-2.6.0:latest

Note

Include --ipc=host in the Docker run command for the Docker images. This is required for distributed training using the Habana Collective Communication Library (HCCL); allowing re-use of host shared memory for best performance.
To run the Docker image with a partial number of the supplied Gaudi devices, make sure to set the Device to module mapping correctly. See Multiple Dockers Each with a Single Workload for further details.

Amazon ECR Public Gallery¶

To pull and run Docker images from Amazon ECR Public Library, make sure to follow the steps detailed in Pulling a public image.

AWS Deep Learning Containers¶

To set up and use AWS Deep Learning containers, follow the instructions detailed in AWS Available Deep Learning Containers Images.

Build Docker Images from Intel Gaudi Dockerfiles¶

To build custom Docker images, follow the steps as described in the Setup and Install Repo.

Launch Docker Image¶

Use the below command to launch the Docker image. Make sure to update the below command with the required operating system. See the Support Matrix for a list of supported operating systems:

docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -v /opt/datasets:/datasets --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.21.2/${OS}/habanalabs/pytorch-installer-2.6.0:latest

Run PyTorch on Habana Base AMI¶

Install PyTorch¶

This section describes how to obtain and install the PyTorch software package. Follow the instructions below to install PyTorch packages on a bare metal platform or virtual machine.

Note

Installing PyTorch with Docker is the recommended installation method and does not require additional steps. For further details, refer to Pull and Launch Docker Image - Intel Gaudi Vault section.

Intel Gaudi PyTorch packages consist of:

torch - PyTorch framework package with Intel Gaudi support.
habana-torch-plugin - Libraries and modules needed to execute PyTorch on single card, single-server and multi-server setup.
habana-torch-dataloader - Intel Gaudi multi-threaded dataloader package.
torchvision and torchaudio - Torchvision and Torchaudio packages compiled in torch environment. No Gaudi specific changes in this package.
torch-tb-profiler - The Tensorboard plugin used to display Gaudi-specific information on TensorBoard.

Run the hl-smi tool to confirm the Intel Gaudi software version installed. You will need to use the correct version of the installer based on the version you are running. For example, if the installed version is 1.21.1, you should see the below:
```
HL-SMI Version:       hl-1.21.1-XXXXXXX
Driver Version:       1.21.1-XXXXXX
```

Install the Intel Gaudi PyTorch environment by running the following command:

wget -nv https://vault.habana.ai/artifactory/gaudi-installer/1.21.2/habanalabs-installer.sh
chmod +x habanalabs-installer.sh
./habanalabs-installer.sh install -t dependencies
./habanalabs-installer.sh install --type pytorch --venv

Note

Installing dependencies requires sudo permission.
Verify that PyTorch is already installed in the path listed in the PYTHONPATH environment variable. If it is, uninstall it before proceeding or remove the path from the PYTHONPATH.
This script supports fresh installations only. SW upgrades are not supported.

The -- venv flag installs PyTorch inside the virtual environment. The default virtual environment folder is $HOME/habanalabs-venv. To override the default, run the following command:

export HABANALABS_VIRTUAL_DIR=xxxx

Model References Requirements¶

Some PyTorch models need additional Python packages. They can be installed using Python requirements files provided in Model References repository. Refer to Model References repository for detailed instructions on running PyTorch models.

Set up Python for Models¶

Using your own models requires setting Python 3.10 as the default Python version. If Python 3.10 is not the default version, replace any call to the Python command on your model with $PYTHON and define the environment variable as below:

export PYTHON=/usr/bin/python3.10

Running models from Intel Gaudi Model References GitHub repository, requires the PYTHON environment variable to match the supported Python release:

export PYTHON=/usr/bin/<python version>

Note

The Python version depends on the operating system. Refer to the Support Matrix for a full list of supported operating systems and Python versions.

Gaudi Documentation 1.21.1 documentation

Habana Deep Learning Base AMI Installation

On this Page

Habana Deep Learning Base AMI Installation¶

Run Using Containers on Habana Base AMI¶

Pull Prebuilt Containers¶

Pull and Launch Docker Image - Intel Gaudi Vault¶

Amazon ECR Public Gallery¶

AWS Deep Learning Containers¶

Build Docker Images from Intel Gaudi Dockerfiles¶

Launch Docker Image¶

Run PyTorch on Habana Base AMI¶

Install PyTorch¶

Model References Requirements¶

Set up Python for Models¶