Habana Deep Learning Base AMI Installation

The following table outlines the supported installation options and the steps required. Habana’s Base AMI is delivered pre-installed with the necessary installation to run containers.

Objective

Steps

Run Using Containers on Habana Base AMI (Recommended)

  1. Pull Prebuilt Containers or Build Docker Images from Intel Gaudi Dockerfiles

  2. Run models using Intel Gaudi Model References GitHub repository

Run Framework on Habana Base AMI (TensorFlow/PyTorch)

  1. Install framework

  2. Set up Python for Models

  3. Run models using Intel Gaudi Model References GitHub repository

Habana Deep Learning AMI also includes AMIs on Amazon ECS and Amazon EKS. See Amazon ECS with Gaudi User Guide and Amazon EKS with Gaudi User Guide for more details.

Note

Before installing the below packages and dockers, make sure to review the currently supported versions and Operating Systems listed in the Support Matrix.

Run Using Containers

Pull Prebuilt Containers

Prebuilt containers are provided in:

  • Intel Gaudi Vault

  • Amazon ECR Public Library

  • AWS Deep Learning Containers (DLC)

Pull and Launch Docker Image - Intel Gaudi Vault

Note

Before running docker, make sure to map the dataset as detailed in Map Dataset to Docker.

To pull and run the Intel Gaudi Docker images use the below code examples. Update the parameters listed in the following table to run the desired configuration.

Parameter

Description

Values

$OS

Operating System of Image

[ubuntu22.04, amzn2, rhel8.6]

$TF_VERSION

Desired TensorFlow Version

[2.15.0]

$PT_VERSION

PyTorch Version

[2.1.1]

Note

  • Include –ipc=host in the docker run command for PyTorch docker images. This is required for distributed training using the Habana Collective Communication Library (HCCL); allowing re-use of host shared memory for best performance.

  • To run the docker image with a partial number of the supplied Gaudi devices, make sure to set the Device to Module mapping correctly. See Multiple Dockers Each with a Single Workload for further details.

    docker pull vault.habana.ai/gaudi-docker/1.14.0/{$OS}/habanalabs/tensorflow-installer-tf-cpu-$2.15.0:latest
     docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host vault.habana.ai/gaudi-docker/1.14.0/{$OS}/habanalabs/tensorflow-installer-tf-cpu-${TF_VERSION}:latest
     docker pull vault.habana.ai/gaudi-docker/1.14.0/{$OS}/habanalabs/pytorch-installer-2.1.1:latest
     docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.14.0/{$OS}/habanalabs/pytorch-installer-2.1.1:latest

AWS Deep Learning Containers

To set up and use AWS Deep Learning Containers, follow the instructions detailed in AWS Available Deep Learning Containers Images.

Build Docker Images from Intel Gaudi Dockerfiles

  1. Download Docker files and build script from the Setup and Install Repo to a local directory.

  2. Run the build script to generate a Docker image:

./docker_build.sh mode [tensorflow,pytorch] os [ubuntu22.04,amzn2,rhel8.6] tf_version

For example:

./docker_build.sh tensorflow ubuntu22.04 2.15.0

Launch Docker Image that was Built

Note

Before running docker, make sure to map the dataset as detailed in Map Dataset to Docker.

Launch the docker image using the below code examples. Update the parameters listed in the following table to run the desired configuration.

Parameter

Description

Values

$OS

Operating System of Image

[ubuntu22.04 amzn2, rhel8.6]

$TF_VERSION

Desired TensorFlow Version

[2.15.0]

$PT_VERSION

Desired PyTorch Version

[2.1.1]

docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host vault.habana.ai/gaudi-docker/1.14.0/${OS}/habanalabs/tensorflow-installer-tf-cpu-$2.15.0:latest
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.14.0/${OS}/habanalabs/pytorch-installer-2.1.1:latest

Map Dataset to Docker

Make sure to download the dataset prior to running docker and mount the location of your dataset to the docker by adding the below flag. For example, host dataset location /opt/datasets/imagenet will mount to /datasets/imagenet inside the docker:

-v /opt/datasets/imagenet:/datasets/imagenet

Note

OPTIONAL: Add the following flag to mount a local host share folder to the docker in order to be able to transfer files out of docker:

-v $HOME/shared:/root/shared

Install Native Frameworks

Installing frameworks with docker is the recommended installation method and does not require additional steps.

TensorFlow Installation

This section describes how to obtain and install the TensorFlow software package. Follow these instructions if you want to install the TensorFlow packages on a Bare Metal platform without a Docker image. The package consists of two main components to guarantee the same functionality delivered with TensorFlow Docker:

  • Base habana-tensorflow Python package - Libraries and modules needed to execute TensorFlow on a single Gaudi device.

  • Scale-out habana-horovod Python package - Libraries and modules needed to execute TensorFlow on a single-node machine.

To install Intel Gaudi TensorFlow, run the following command.

wget -nv https://vault.habana.ai/artifactory/gaudi-installer/latest/habanalabs-installer.sh
chmod +x habanalabs-installer.sh
./habanalabs-installer.sh install -t dependencies
./habanalabs-installer.sh install -t dependencies-tensorflow
./habanalabs-installer.sh install --type tensorflow --venv

Note

  • Running the above command installs the latest version.

  • Installing dependencies requires sudo permission.

  • Before install, make sure to check the supported Python version as listed in the Support Matrix.

  • Make sure to check whether TensorFlow is already installed in the path listed in the environment variable PYTHONPATH. If it is, make sure to either uninstall it before proceeding or remove the path from the PYTHONPATH.

  • This script supports fresh installations only. SW upgrades are not supported.

The -- venv flag installs the relevant framework inside the virtual environment. The default virtual environment folder is $HOME/habanalabs-venv. To override the default, run the following command:

export HABANALABS_VIRTUAL_DIR=xxxx

Model References Requirements

Intel Gaudi provides a number of model references optimized to run on the card. Those models are available at Model-References page.

Many of the references require additional Python packages (installed with pip tools), not provided by Intel Gaudi. The packages required to run topologies from Model References repository are defined in per-topology requirements.txt files in each folder containing the topologies’ scripts.

PyTorch Installation

This section describes how to obtain and install the PyTorch software package. Follow the instructions outlined below to install PyTorch packages on a bare metal platform or virtual machine without a Docker image.

Intel Gaudi PyTorch packages consist of:

  • torch - PyTorch framework package with Intel Gaudi support

  • habana-torch-plugin - Libraries and modules needed to execute PyTorch on single card, single node and multi node setup.

  • habana-torch-dataloader - Intel Gaudi multi-threaded dataloader package.

  • torchvision and torchaudio - Torchvision and Torchaudio packages compiled in torch environment. No Gaudi specific changes in this package.

  • habana-gpu-migration - The library for the GPU Migration Toolkit. See GPU Migration Toolkit for more information.

  • torch-tb-profiler - The Tensorboard plugin used to display Gaudi specific information on TensorBoard.

To install Intel Gaudi PyTorch environment, run the following command.

wget -nv https://vault.habana.ai/artifactory/gaudi-installer/latest/habanalabs-installer.sh
chmod +x habanalabs-installer.sh
./habanalabs-installer.sh install -t dependencies
./habanalabs-installer.sh install --type pytorch --venv

Note

  • Running the above command installs the latest version.

  • Installing dependencies requires sudo permission.

  • Make sure to check whether PyTorch is already installed in the path listed in the environment variable PYTHONPATH. If it is, make sure to either uninstall it before proceeding or remove the path from the PYTHONPATH.

  • This script supports fresh installations only. SW upgrades are not supported.

The -- venv flag installs the relevant framework inside the virtual environment. The default virtual environment folder is $HOME/habanalabs-venv. To override the default, run the following command:

export HABANALABS_VIRTUAL_DIR=xxxx

Model References Requirements

Some PyTorch models need additional python packages. They can be installed using python requirements files provided in Model References repository. Refer to Model References repository for detailed instructions on running PyTorch models.

Set up Python for Models

Using your own models requires setting python 3.8 as the default python version. If python 3.8 is not the default version, replace any call to the python command on your model with $PYTHON and define the environment variable as below:

export PYTHON=/usr/bin/python3.8

Running models from Intel Gaudi Model References GitHub repository, requires the PYTHON environment variable to match the supported python release:

export PYTHON=/usr/bin/<python version>

Note

  • PyTorch - Python 3.8 is the supported python release for all Operating Systems except for Ubuntu22.04.

  • TensorFlow - Python 3.10 is the supported python release for all Operating Systems.

    Refer to the Support Matrix for a full list of supported Operating Systems and Python versions.