Habana Deep Learning Base AMI Installation

The following table outlines the supported installation options and the steps required. Habana’s Base AMI is delivered preinstalled with the necessary installation to run containers.

Objective

Steps

Run Using Containers on Habana Base AMI (Recommended)

  1. Pull Prebuilt Containers or Build Docker Images from Intel Gaudi Dockerfiles

  2. Run models using Intel Gaudi Model References GitHub repository

Run PyTorch on Habana Base AMI

  1. Install PyTorch

  2. Set up Python for Models

  3. Run models using Intel Gaudi Model References GitHub repository

Habana Deep Learning AMI also includes AMIs on Amazon ECS and Amazon EKS. See Amazon ECS with Gaudi User Guide and Amazon EKS with Gaudi User Guide for more details.

Note

Before installing the below packages and Dockers, make sure to review the currently supported versions and operating systems listed in the Support Matrix.

Run Using Containers on Habana Base AMI

Pull Prebuilt Containers

Prebuilt containers are provided in:

  • Intel Gaudi vault

  • Amazon ECR Public Library

  • AWS Deep Learning Containers (DLC)

Pull and Launch Docker Image - Intel Gaudi Vault

  1. Use the below command to pull Docker:

       docker pull vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
    
       docker pull vault.habana.ai/gaudi-docker/1.18.0/amzn2/habanalabs/pytorch-installer-2.4.0:latest
    
  2. Use the below command to run Docker:

       docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -v /opt/datasets:/datasets --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
    
       docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -v /opt/datasets:/datasets --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.18.0/amzn2/habanalabs/pytorch-installer-2.4.0:latest
    

Note

  • Include --ipc=host in the Docker run command for the Docker images. This is required for distributed training using the Habana Collective Communication Library (HCCL); allowing re-use of host shared memory for best performance.

  • To run the Docker image with a partial number of the supplied Gaudi devices, make sure to set the Device to module mapping correctly. See Multiple Dockers Each with a Single Workload for further details.

AWS Deep Learning Containers

To set up and use AWS Deep Learning containers, follow the instructions detailed in AWS Available Deep Learning Containers Images.

Build Docker Images from Intel Gaudi Dockerfiles

To build custom Docker images, follow the steps as described in the Setup and Install Repo.

Launch Docker Image

Use the below command to launch the Docker image. Make sure to update the below command with the required operating system. See the Support Matrix for a list of supported operating systems:

docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -v /opt/datasets:/datasets --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.18.0/${OS}/habanalabs/pytorch-installer-2.4.0:latest

Run PyTorch on Habana Base AMI

Install PyTorch

This section describes how to obtain and install the PyTorch software package. Follow the instructions below to install PyTorch packages on a bare metal platform or virtual machine.

Note

Installing PyTorch with Docker is the recommended installation method and does not require additional steps. For further details, refer to Pull and Launch Docker Image - Intel Gaudi Vault section.

Intel Gaudi PyTorch packages consist of:

  • torch - PyTorch framework package with Intel Gaudi support.

  • habana-torch-plugin - Libraries and modules needed to execute PyTorch on single card, single-server and multi-server setup.

  • habana-torch-dataloader - Intel Gaudi multi-threaded dataloader package.

  • torchvision and torchaudio - Torchvision and Torchaudio packages compiled in torch environment. No Gaudi specific changes in this package.

  • torch-tb-profiler - The Tensorboard plugin used to display Gaudi-specific information on TensorBoard.

  1. Run the hl-smi tool to confirm the Intel Gaudi software version installed. You will need to use the correct version of the installer based on the version you are running. For example, if the installed version is 1.17.1, you should see the below:

    HL-SMI Version:       hl-1.17.1-XXXXXXX
    Driver Version:       1.17.1-XXXXXX
    
  2. Install the Intel Gaudi PyTorch environment by running the following command:

    wget -nv https://vault.habana.ai/artifactory/gaudi-installer/1.18.0/habanalabs-installer.sh
    chmod +x habanalabs-installer.sh
    ./habanalabs-installer.sh install -t dependencies
    ./habanalabs-installer.sh install --type pytorch --venv
    

Note

  • Installing dependencies requires sudo permission.

  • Verify that PyTorch is already installed in the path listed in the PYTHONPATH environment variable. If it is, uninstall it before proceeding or remove the path from the PYTHONPATH.

  • This script supports fresh installations only. SW upgrades are not supported.

The -- venv flag installs PyTorch inside the virtual environment. The default virtual environment folder is $HOME/habanalabs-venv. To override the default, run the following command:

export HABANALABS_VIRTUAL_DIR=xxxx

Model References Requirements

Some PyTorch models need additional Python packages. They can be installed using Python requirements files provided in Model References repository. Refer to Model References repository for detailed instructions on running PyTorch models.

Set up Python for Models

Using your own models requires setting Python 3.10 as the default Python version. If Python 3.10 is not the default version, replace any call to the Python command on your model with $PYTHON and define the environment variable as below:

export PYTHON=/usr/bin/python3.10

Running models from Intel Gaudi Model References GitHub repository, requires the PYTHON environment variable to match the supported Python release:

export PYTHON=/usr/bin/<python version>

Note

The Python version depends on the operating system. Refer to the Support Matrix for a full list of supported operating systems and Python versions.