Habana Deep Learning Base AMI Installation

The following table outlines the supported installation options and the steps required. Habana’s Base AMI is delivered pre-installed with the necessary installation to run containers.

Objective

Steps

Run Using Containers on Habana Base AMI (Recommended)

  1. Pull Prebuilt Containers or Build Docker Images from Intel Gaudi Dockerfiles

  2. Run models using Intel Gaudi Model References GitHub repository

Run Framework on Habana Base AMI (PyTorch)

  1. Install framework

  2. Set up Python for Models

  3. Run models using Intel Gaudi Model References GitHub repository

Habana Deep Learning AMI also includes AMIs on Amazon ECS and Amazon EKS. See Amazon ECS with Gaudi User Guide and Amazon EKS with Gaudi User Guide for more details.

Note

Before installing the below packages and dockers, make sure to review the currently supported versions and Operating Systems listed in the Support Matrix.

Run Using Containers

Pull Prebuilt Containers

Prebuilt containers are provided in:

  • Intel Gaudi vault

  • Amazon ECR Public Library

  • AWS Deep Learning Containers (DLC)

Pull and Launch Docker Image - Intel Gaudi Vault

Note

Before running Docker, make sure to map the dataset as detailed in Map Dataset to Docker.

Use the below commands to pull and run Dockers. Make sure to update the below command with the required Operating System. See the Support Matrix for a list of supported Operating Systems:

     docker pull vault.habana.ai/gaudi-docker/1.15.1/{$OS}/habanalabs/pytorch-installer-2.2.0:latest
     docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.15.1/{$OS}/habanalabs/pytorch-installer-2.2.0:latest

Note

  • Include –ipc=host in the Docker run command for PyTorch Docker images. This is required for distributed training using the Habana Collective Communication Library (HCCL); allowing re-use of host shared memory for best performance.

  • To run the Docker image with a partial number of the supplied Gaudi devices, make sure to set the Device to Module mapping correctly. See Multiple Dockers Each with a Single Workload for further details.

AWS Deep Learning Containers

To set up and use AWS Deep Learning containers, follow the instructions detailed in AWS Available Deep Learning Containers Images.

Build Docker Images from Intel Gaudi Dockerfiles

  1. Download Docker files and build script from the Setup and Install Repo to a local directory.

  2. Run the build script to generate a Docker image:

./docker_build.sh mode [pytorch] os [ubuntu22.04,amzn2] framework_version

For example:

./docker_build.sh ubuntu22.04 2.2.0

Launch Docker Image that was Built

Note

Before running docker, make sure to map the dataset as detailed in Map Dataset to Docker.

Use the below commands to launch the docker image. Make sure to update the below command with the required Operating System. See the Support Matrix for a list of supported Operating Systems:

docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.15.1/${OS}/habanalabs/pytorch-installer-2.2.0:latest

Map Dataset to Docker

Make sure to download the dataset prior to running docker and mount the location of your dataset to the docker by adding the below flag. For example, host dataset location /opt/datasets/imagenet will mount to /datasets/imagenet inside the docker:

-v /opt/datasets/imagenet:/datasets/imagenet

Note

OPTIONAL: Add the following flag to mount a local host share folder to the docker in order to be able to transfer files out of docker:

-v $HOME/shared:/root/shared

Install PyTorch

This section describes how to obtain and install the PyTorch software package. Follow the instructions below to install PyTorch packages on a bare metal platform or virtual machine without using a Docker image.

Note

Installing PyTorch with Docker is the recommended installation method and does not require additional steps. For further details, refer to Pull and Launch Docker Image - Intel Gaudi Vault section.

Intel Gaudi PyTorch packages consist of:

  • torch - PyTorch framework package with Intel Gaudi support

  • habana-torch-plugin - Libraries and modules needed to execute PyTorch on single card, single node and multi node setup.

  • habana-torch-dataloader - Intel Gaudi multi-threaded dataloader package.

  • torchvision and torchaudio - Torchvision and Torchaudio packages compiled in torch environment. No Gaudi specific changes in this package.

  • habana-gpu-migration - The library for the GPU Migration Toolkit. See GPU Migration Toolkit for more information.

  • torch-tb-profiler - The Tensorboard plugin used to display Gaudi specific information on TensorBoard.

  1. Run the hl-smi tool to confirm the Intel Gaudi software version installed. You will need to use the correct version of the installer based on the version you are running. For example, if the installed version is 1.15.0, you should see the below:

       HL-SMI Version:       hl-1.15.0-XXXXXXX
       Driver Version:       1.15.0-XXXXXX
    
  2. Install the Intel Gaudi PyTorch environment by running the following command:

    wget -nv https://vault.habana.ai/artifactory/gaudi-installer/1.15.1/habanalabs-installer.sh
    chmod +x habanalabs-installer.sh
    ./habanalabs-installer.sh install -t dependencies
    ./habanalabs-installer.sh install --type pytorch --venv
    

Note

  • Installing dependencies requires sudo permission.

  • Make sure to check whether PyTorch is already installed in the path listed in the environment variable PYTHONPATH. If it is, make sure to either uninstall it before proceeding or remove the path from the PYTHONPATH.

  • This script supports fresh installations only. SW upgrades are not supported.

The -- venv flag installs PyTorch inside the virtual environment. The default virtual environment folder is $HOME/habanalabs-venv. To override the default, run the following command:

export HABANALABS_VIRTUAL_DIR=xxxx

Model References Requirements

Some PyTorch models need additional Python packages. They can be installed using Python requirements files provided in Model References repository. Refer to Model References repository for detailed instructions on running PyTorch models.

Set up Python for Models

Using your own models requires setting Python 3.8 as the default Python version. If Python 3.8 is not the default version, replace any call to the Python command on your model with $PYTHON and define the environment variable as below:

export PYTHON=/usr/bin/python3.8

Running models from Intel Gaudi Model References GitHub repository, requires the PYTHON environment variable to match the supported Python release:

export PYTHON=/usr/bin/<python version>

Note

Python 3.8 is the supported Python release for all Operating Systems except for Ubuntu22.04. Refer to the Support Matrix for a full list of supported Operating Systems and Python versions.