Habana Deep Learning Base AMI Installation

The following table outlines the supported installation options and the steps required. Habana’s Base AMI is delivered pre-installed with the necessary installation to run containers.

Objective

Steps

Run Using Containers on Habana Base AMI (Recommended)

  1. Pull Prebuilt Containers or Build Docker Images from Habana Dockerfiles

  2. Run models using Habana Model-References

Run Framework on Habana Base AMI (TensorFlow/PyTorch)

  1. Install framework

  2. Set up Python for Models

  3. Run models using Habana Model-References

Habana Deep Learning AMI also includes AMIs on Amazon ECS and Amazon EKS. See Amazon ECS with Habana Getting Started Guide and Amazon EKS with Habana Getting Started Guide for more details.

Run Using Containers

Pull Prebuilt Containers

Prebuilt containers are provided in:

  • Habana Vault

  • Amazon ECR Public Library

  • AWS Deep Learning Containers (DLC)

Pull and Launch Docker Image - Habana Vault

Note

Before running docker, make sure to map the dataset as detailed in Map Dataset to Docker.

To pull and run the Habana Docker images use the below code examples. Update the parameters listed in the following table to run the desired configuration.

Parameter

Description

Values

$OS

Operating System of Image

[ubuntu18.04, ubuntu20.04, amzn2, rhel8.6]

$TF_VERSION

Desired TensorFlow Version

[2.9.1, 2.8.2]

$PT_VERSION

PyTorch Version

[1.12.0]

    docker pull vault.habana.ai/gaudi-docker/1.6.1/{$OS}/habanalabs/tensorflow-installer-tf-cpu-${TF_VERSION}:latest
     docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host vault.habana.ai/gaudi-docker/1.6.1/{$OS}/habanalabs/tensorflow-installer-tf-cpu-${TF_VERSION}:latest
     docker pull vault.habana.ai/gaudi-docker/1.6.1/{$OS}/habanalabs/pytorch-installer-1.12.0:latest
     docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.6.1/{$OS}/habanalabs/pytorch-installer-1.12.0:latest

AWS Deep Learning Containers

To set up and use AWS Deep Learning Containers, follow the instructions detailed in AWS Available Deep Learning Containers Images.

Build Docker Images from Habana Dockerfiles

  1. Download Docker files and build script from the Setup and Install Repo to a local directory.

  2. Run the build script to generate a Docker image:

./docker_build.sh mode [tensorflow,pytorch] os [ubuntu18.04,ubuntu20.04,amzn2,rhel8.6] tf_version [{Habana TF Version 1}, {Habana TF Version 2}]

For example:

./docker_build.sh tensorflow ubuntu20.04 2.8.2

Launch Docker Image that was Built

Note

Before running docker, make sure to map the dataset as detailed in Map Dataset to Docker.

Launch the docker image using the below code examples. Update the parameters listed in the following table to run the desired configuration.

Parameter

Description

Values

$OS

Operating System of Image

[ubuntu18.04, ubuntu20.04, amzn2, rhel8.6]

$TF_VERSION

Desired TensorFlow Version

[2.9.1, 2.8.2]

docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host vault.habana.ai/gaudi-docker/1.6.1/${OS}/habanalabs/tensorflow-installer-tf-cpu-${TF_VERSION}:latest
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.6.1/${OS}/habanalabs/pytorch-installer-1.12.0:latest

Map Dataset to Docker

Make sure to download the dataset prior to running docker and mount the location of your dataset to the docker by adding the below flag. For example, host dataset location /opt/datasets/imagenet will mount to /datasets/imagenet inside the docker:

-v /opt/datasets/imagenet:/datasets/imagenet

Note

OPTIONAL: Add the following flag to mount a local host share folder to the docker in order to be able to transfer files out of docker:

-v $HOME/shared:/root/shared

Install Native Frameworks

Installing frameworks with docker is the recommended installation method and does not require additional steps.

TensorFlow Installation

This section describes how to obtain and install the TensorFlow software package. Follow these instructions if you want to install the TensorFlow packages on a Bare Metal platform without a Docker image. The package consists of two main components:

  • Base habana-tensorflow Python package - Libraries and modules needed to execute TensorFlow on a single Gaudi device.

  • Scale-out habana-horovod Python package - Libraries and modules needed to execute TensorFlow on a single-node machine.

Setting Up the Environment

Habana TensorFlow support package consists of two Python Packages. Installing both packages guarantees the same functionality delivered with TensorFlow Docker:

  • habana-tensorflow - execute TensorFlow on a single Gaudi device

  • habana-horovod - execute TensorFlow on a single-node machine

  1. To set up the environment, the SynapseAI On-Premise software package must be installed first. Manually install the components listed in Set up On Premise before installing the Habana TensorFlow package.

  2. To prepare Habana TensorFlow environment, download and execute bash script tensorflow_installation.sh. This script works only for currently supported Operating Systems specified in Support Matrix.

By installing Habana TensorFlow environment, the following will be performed throughout the execution:

  • Auto-detect OS type and supported Python version for which packages are present on the Python Package Index (PyPI).

  • Try to auto-detect SynapseAI software version and build number based on installed packages.

  • Install OS specific dependent deb/rpm packages.

  • (Disabled by default) Install extra Model references requirements. See Model References Requirements.

  • Download and install Open MPI and mpi4py package.

  • Set the MPI_ROOT environment variable for use in the command line.

  • Uninstall any existing TensorFlow package.

  • Uninstall existing Habana TensorFlow Python packages.

  • Install recommended TensorFlow package (configurable via --tf parameter).

  • Install Habana TensorFlow Python packages matching the SynapseAI software package.

  • Add required environment variables in /etc/profile.d/habanalabs.sh and source /etc/profile.d/habanalabs.sh in ~/.bashrc.

  • Run simple TensorFlow workload with Habana TensorFlow and validates that it has been executed on Habana Gaudi.

Note

tensorflow_installation.sh accepts optional input parameters that can also override auto-detection described above. Run ./tensorflow_installation.sh --help for more details.

Model References Requirements

Habana provides a number of model references optimized to run on Gaudi. Those models are available at Model-References page.

Many of the references require additional 3rd party packages, not provided by Habana. This section describes how to install the required 3rd party packages.

There are two types of packages required by the Model References:

  • System packages - installed with OS packet manager (e.g. apt in case of Ubuntu). To install system packages, run installation script with --extra_deps argument: ./tensorflow_installation.sh --extra_deps.

  • Python packages - installed with pip tools. Packages required to run topologies from Model References repository are defined in per-topology requirements.txt files in each folder containing the topologies’ scripts.

PyTorch Installation

This section describes how to obtain and install the PyTorch software package. Follow the instructions outlined below to install PyTorch packages on a bare metal platform or virtual machine without a Docker image.

Habana PyTorch packages consist of:

  • torch - PyTorch framework package with Habana support

  • habana-torch-plugin - Libraries and modules needed to execute PyTorch on single card, single node and multi node setup.

  • habana-torch-dataloader - Habana multi-threaded dataloader package.

  • torchvision - Torchvision package compiled in torch environment. No Habana specific changes in this package.

Setting Up the Environment

  1. To set up the environment, the SynapseAI On-Premise software package must be installed first. Manually install the components listed in Set up On-Premise before installing the PyTorch package.

  2. To setup Habana PyTorch environment, download and execute bash script pytorch_installation.sh.

By installing the PyTorch environment, the below will be performed throughout the execution:

  • Autodetect OS type and the supported python version for which Habana PyTorch wheel packages are present in Vault.

  • Try to autodetect Habana software version and build number.

  • Install OS specific dependent deb/rpm packages.

  • Download and install Open MPI and mpi4py package.

  • Set the MPI_ROOT environment variable for use in the command line

  • Download tar ball file that will have the PyTorch specific packages from the Habana Vault.

  • Install requirements-pytorch.txt which exists inside tar ball.

  • Uninstall torch, as it will be installed by deafult while installing requirements-pytorch.txt.

  • Install Habana PyTorch python packages.

  • Uninstall pillow package and install pillow-simd.

  • Add the required environment variables in /etc/profile.d/habanalabs.sh and source /etc/profile.d/habanalabs.sh in ~/.bashrc.

Note

Refer to the Support Matrix to view the supported python version for each of the Operating Systems.

Command Line Usage

The following are examples of CLI usage when executing PyTorch packages:

  • To autodetect OS type, Habana Software version and build number, use ./pytorch_installation.sh.

  • To download and install specific version and build, use ./pytorch_installation.sh -v {Build number}.

The Supported Options:

  • -v <software version> - Habana software version eg {Version}

  • -b <build/revision> - Habana build number eg: 148 in 1.2.0-148

  • -os <os version> - OS version <ubuntu2004/ubuntu1804/amzn2/rhel79/rhel83.

  • -ndep - do not install rpm/deb dependencies.

  • -sys - install python packages without --user.

  • -u - install python packages with --user.

Note

Some PyTorch models need additional python packages. They can be installed using python requirements files provided in Model References repository. Refer to Model References repository for detailed instructions on running PyTorch models.

Set up Python for Models

Using your own models requires setting python 3.8 as the default python version. If python 3.8 is not the default version, replace any call to the python command on your model with $PYTHON and define the environment variable as below:

export PYTHON=/usr/bin/python3.8

Running models from Habana Model-References, requires the PYTHON environment variable to match the supported python release:

export PYTHON=/usr/bin/python3.8

Note

Python 3.8 is the supported python release for all Operating Systems listed in the Support Matrix.