Habana Deep Learning Base AMI Installation
On this Page
Habana Deep Learning Base AMI Installation¶
The following table outlines the supported installation options and the steps required. Habana’s Base AMI is delivered pre-installed with the necessary installation to run containers.
Objective |
Steps |
|||
---|---|---|---|---|
Run Using Containers on Habana Base AMI (Recommended) |
||||
Run Framework on Habana Base AMI (TensorFlow/PyTorch) |
|
Habana Deep Learning AMI also includes AMIs on Amazon ECS and Amazon EKS. See Amazon ECS with Habana User Guide and Amazon EKS with Habana User Guide for more details.
Note
Before installing the below packages and dockers, make sure to review the currently supported versions and Operating Systems listed in the Support Matrix.
Run Using Containers¶
Pull Prebuilt Containers¶
Prebuilt containers are provided in:
Habana Vault
Amazon ECR Public Library
AWS Deep Learning Containers (DLC)
Pull and Launch Docker Image - Habana Vault¶
Note
Before running docker, make sure to map the dataset as detailed in Map Dataset to Docker.
To pull and run the Habana Docker images use the below code examples. Update the parameters listed in the following table to run the desired configuration.
Parameter |
Description |
Values |
---|---|---|
$OS |
Operating System of Image |
[ubuntu20.04, ubuntu22.04, amzn2, rhel8.6] |
$TF_VERSION |
Desired TensorFlow Version |
[2.12.1] |
$PT_VERSION |
PyTorch Version |
[2.0.1] |
Note
Include –ipc=host in the docker run command for PyTorch docker images. This is required for distributed training using the Habana Collective Communication Library (HCCL); allowing re-use of host shared memory for best performance.
docker pull vault.habana.ai/gaudi-docker/1.11.0/{$OS}/habanalabs/tensorflow-installer-tf-cpu-$2.12.1:latest
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host vault.habana.ai/gaudi-docker/1.11.0/{$OS}/habanalabs/tensorflow-installer-tf-cpu-${TF_VERSION}:latest
docker pull vault.habana.ai/gaudi-docker/1.11.0/{$OS}/habanalabs/pytorch-installer-2.0.1:latest
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.11.0/{$OS}/habanalabs/pytorch-installer-2.0.1:latest
Amazon ECR Public Gallery¶
To pull and run docker images from Amazon ECR Public Library, make sure to follow the steps detailed in Pulling a public image.
AWS Deep Learning Containers¶
To set up and use AWS Deep Learning Containers, follow the instructions detailed in AWS Available Deep Learning Containers Images.
Build Docker Images from Habana Dockerfiles¶
Download Docker files and build script from the Setup and Install Repo to a local directory.
Run the build script to generate a Docker image:
./docker_build.sh mode [tensorflow,pytorch] os [ubuntu20.04,ubuntu22.04,amzn2,rhel8.6] tf_version
For example:
./docker_build.sh tensorflow ubuntu20.04 2.12.1
Launch Docker Image that was Built¶
Note
Before running docker, make sure to map the dataset as detailed in Map Dataset to Docker.
Launch the docker image using the below code examples. Update the parameters listed in the following table to run the desired configuration.
Parameter |
Description |
Values |
---|---|---|
$OS |
Operating System of Image |
[ubuntu20.04, ubuntu22.04 amzn2, rhel8.6] |
$TF_VERSION |
Desired TensorFlow Version |
[2.12.1] |
$PT_VERSION |
Desired PyTorch Version |
[2.0.1] |
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host vault.habana.ai/gaudi-docker/1.11.0/${OS}/habanalabs/tensorflow-installer-tf-cpu-$2.12.1:latest
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.11.0/${OS}/habanalabs/pytorch-installer-2.0.1:latest
Map Dataset to Docker¶
Make sure to download the dataset prior to running docker and mount the
location of your dataset to the docker by adding the below flag. For
example, host dataset location /opt/datasets/imagenet
will mount to
/datasets/imagenet
inside the docker:
-v /opt/datasets/imagenet:/datasets/imagenet
Note
OPTIONAL: Add the following flag to mount a local host share folder to the docker in order to be able to transfer files out of docker:
-v $HOME/shared:/root/shared
Install Native Frameworks¶
Installing frameworks with docker is the recommended installation method and does not require additional steps.
TensorFlow Installation¶
This section describes how to obtain and install the TensorFlow software package. Follow these instructions if you want to install the TensorFlow packages on a Bare Metal platform without a Docker image. The package consists of two main components to guarantee the same functionality delivered with TensorFlow Docker:
Base habana-tensorflow Python package - Libraries and modules needed to execute TensorFlow on a single Gaudi device.
Scale-out habana-horovod Python package - Libraries and modules needed to execute TensorFlow on a single-node machine.
To install Habana TensorFlow, run the following command.
wget -nv https://vault.habana.ai/artifactory/gaudi-installer/latest/habanalabs-installer.sh
chmod +x habanalabs-installer.sh
./habanalabs-installer.sh install -t dependencies
./habanalabs-installer.sh install --type tensorflow --venv
Note
Running the above command installs the latest version.
Installing dependencies requires sudo permission.
Make sure to check whether TensorFlow is already installed in the path listed in the environment variable PYTHONPATH. If it is, make sure to either uninstall it before proceeding or remove the path from the PYTHONPATH.
This script supports fresh installations only. SW upgrades are not supported.
The -- venv
flag installs the relevant framework inside the virtual environment. The default virtual environment folder is $HOME/habanalabs-venv
.
To override the default, run the following command:
export HABANALABS_VIRTUAL_DIR=xxxx
Model References Requirements¶
Habana provides a number of model references optimized to run on Gaudi. Those models are available at Model-References page.
Many of the references require additional Python packages (installed with pip tools), not provided by Habana.
The packages required to run topologies from Model References repository are defined
in per-topology requirements.txt
files in each folder containing the topologies’ scripts.
PyTorch Installation¶
This section describes how to obtain and install the PyTorch software package. Follow the instructions outlined below to install PyTorch packages on a bare metal platform or virtual machine without a Docker image.
Habana PyTorch packages consist of:
torch
- PyTorch framework package with Habana supporthabana-torch-plugin
- Libraries and modules needed to execute PyTorch on single card, single node and multi node setup.habana-torch-dataloader
- Habana multi-threaded dataloader package.torchvision
andtorchaudio
- Torchvision and Torchaudio packages compiled intorch
environment. No Habana specific changes in this package.habana-gpu-migration
- The library for the GPU Migration Toolkit. See GPU Migration Toolkit for more information.torch-tb-profiler
- The Tensorboard plugin used to display Habana Gaudi specific information on TensorBoard.
To install Habana PyTorch environment, run the following command.
wget -nv https://vault.habana.ai/artifactory/gaudi-installer/latest/habanalabs-installer.sh
chmod +x habanalabs-installer.sh
./habanalabs-installer.sh install -t dependencies
./habanalabs-installer.sh install --type pytorch --venv
Note
Running the above command installs the latest version.
Installing dependencies requires sudo permission.
Make sure to check whether PyTorch is already installed in the path listed in the environment variable PYTHONPATH. If it is, make sure to either uninstall it before proceeding or remove the path from the PYTHONPATH.
This script supports fresh installations only. SW upgrades are not supported.
The -- venv
flag installs the relevant framework inside the virtual environment. The default virtual environment folder is $HOME/habanalabs-venv
.
To override the default, run the following command:
export HABANALABS_VIRTUAL_DIR=xxxx
Model References Requirements¶
Some PyTorch models need additional python packages. They can be installed using python requirements files provided in Model References repository. Refer to Model References repository for detailed instructions on running PyTorch models.
Set up Python for Models¶
Using your own models requires setting python 3.8 as the default python version. If python 3.8 is not the default version, replace any call to the python command on your model with $PYTHON and define the environment variable as below:
export PYTHON=/usr/bin/python3.8
Running models from Habana Model-References, requires the PYTHON environment variable to match the supported python release:
export PYTHON=/usr/bin/<python version>
Note
Python 3.8 is the supported python release for all Operating Systems except for Ubuntu22.04. See the versions listed in the Support Matrix.