3. AWS SW Update for Habana

3.1. Objective

This document describes how to manage the Habana SynapseAI® software suite upgrades on AWS DL1 instances. It serves as a quick start guide for AWS users operating the DL1 instance, enables them to start with existing AWS instances and update to the latest Habana SW and associated frameworks.

Note

Most of the information here is also available at the Setup and Install GitHub page.

The main components of the Habana SW Stack:

  • OS Firmware

  • Habana Driver

  • SynapseAI® SW stack

  • TensorFlow or PyTorch Framework

Generally, to update the Habana SW, the following steps are required:

  1. Unlock the Synapse SW.

  2. Uninstall the old TensorFlow or PyTorch Framework.

  3. Update the SW, Framework or run a new Docker Image.

3.2. Use Cases

There are two main use cases for the SW update on the DL1 instance:

  • Starting with an AWS based DLAMI, installing a full update of the Habana Software ingredients and framework. This covers the existing 0.15.4 DLAMI using Ubuntu18 and Amazon Linux 2, and running either TensorFlow or PyTorch Frameworks.

  • Preparing to update from a DLAMI or a Base AMI to run a Docker Image. This docker image can be an AWS Deep Learning Container or a Docker Image from the Habana Vault.

3.3. OS Check

Run the following command to verify the OS version that is being used (if needed):

awk -F= '/^NAME/{print $2}' /etc/os-release

3.4. Full Software Update

In this case, start with the AWS DLAMI from the Community AMI section.

3.4.1. For TensorFlow Ubuntu18.04

  1. Uninstall all Tensorflow components from 0.15.4:

export PYTHON=/usr/bin/python3.7

sudo ${PYTHON} -m pip uninstall -y habana-horovod habana-tensorflow tensorflow-cpu

3.4.2. For TensorFlow Amazon Linux 2

  1. Unlock the Synapse Components:

rpm -qa|grep habana ## check the current status of the installed SynapseAI SW.

yum versionlock list ## confirm what is locked.

sudo yum versionlock delete habanalabs*
  1. Uninstall all Tensorflow components from 0.15.4:

export PYTHON=/usr/bin/python3.7

sudo ${PYTHON} -m pip uninstall -y habana-horovod habana-tensorflow tensorflow-cpu

3.4.3. For PyTorch Ubuntu 18.04

To upgrade the packages in Ubuntu 18.04, follow the steps below:

  1. Remove PyTorch shared libraries from /usr/lib/habanalabs:

cd /usr/lib/habanalabs/
sudo rm -rf libhabana_pytorch_plugin.so
libpytorch_synapse_helpers.so _py_pytorch_synapse_logger.so
pytorch_synapse_logger.so hb_torch.cpython-37m-x86_64-linux-gnu.so
  1. Uninstall PyTorch specific python packages:

export PYTHON=/usr/bin/python3.7

sudo ${PYTHON} -m pip uninstall habana-torch habana-torch-dataloader
habana-torch-hcl habanaOptimizerSparseAdagrad-cpp
habanaOptimizerSparseSgd-cpp hb-custom hb-torch hmp preproc-cpp
gather2d-cpp HabanaEmbeddingBag-cpp –y
  1. Uninstall dependent python packages from v0.15.4 requirements file:

sudo wget
https://vault.habana.ai/gaudi-pt-modules/0.15.4/75/ubuntu1804/binary/pytorch_modules-0.15.4_75.tgz
-P ${HOME}/.dlami/

cd ${HOME}/.dlami/

sudo mkdir -p ${HOME}/.dlami/habanalabs/pytorch_temp

sudo tar -xf pytorch_modules-0.15.4_75.tgz -C ${HOME}/.dlami/habanalabs/pytorch_temp/.

sudo ${PYTHON} -m pip uninstall -r ${HOME}/.dlami/habanalabs/pytorch_temp/requirements-pytorch.txt -y

sudo rm -rf ${HOME}/.dlami/habanalabs/pytorch_temp/

sudo rm -rf ${HOME}/.dlami/pytorch_modules-0.15.4_75.tgz
  1. Install the latest PyTorch Package:

sudo wget
https://vault.habana.ai/gaudi-pt-modules/1.1.1/94/ubuntu1804/binary/pytorch_modules-v1.9.1_1.1.1_94.tgz
-P ${HOME}/.dlami/

cd ${HOME}/.dlami/

sudo tar -xvf pytorch_modules-v1.9.1_1.1.1_94.tgz -C ${HOME}/.dlami/habanalabs

cd habanalabs

sudo $PYTHON -m pip install -r requirements-pytorch.txt

sudo $PYTHON -m pip uninstall torch -y

sudo $PYTHON -m pip install torch*.whl

sudo $PYTHON -m pip install habana_torch*.whl

sudo $PYTHON -m pip install habana_torch_dataloader*.whl

sudo $PYTHON -m pip install habana_dataloader*.whl

sudo $PYTHON -m pip install transformers*.whl

sudo $PYTHON -m pip install fairseq*.whl

sudo $PYTHON -m pip install pytorch_lightning*.whl

sudo $PYTHON -m pip uninstall pillow

sudo $PYTHON -m pip install pillow-simd==7.0.0.post3

LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/habanalabs:/usr/local/openmpi/lib

3.4.4. For PyTorch Amazon Linux 2

  1. Unlock the Synapse Components:

rpm -qa|grep habana ## check the current status of the installed SynapseAI SW.

yum versionlock list ## confirm what is locked.

sudo yum versionlock delete habanalabs*
  1. Remove PyTorch shared libraries from /usr/lib/habanalabs:

cd /usr/lib/habanalabs/
sudo rm -rf libhabana_pytorch_plugin.so
libpytorch_synapse_helpers.so _py_pytorch_synapse_logger.so
pytorch_synapse_logger.so hb_torch.cpython-37m-x86_64-linux-gnu.so
  1. Uninstall PyTorch specific python packages:

export PYTHON=/usr/bin/python3.7

sudo ${PYTHON} -m pip uninstall habana-torch habana-torch-dataloader
habana-torch-hcl habanaOptimizerSparseAdagrad-cpp
habanaOptimizerSparseSgd-cpp hb-custom hb-torch hmp preproc-cpp
gather2d-cpp HabanaEmbeddingBag-cpp –y
  1. Uninstall dependent python packages from v0.15.4 requirement file:

sudo wget
https://vault.habana.ai/gaudi-pt-modules/0.15.4/75/amzn2/binary/pytorch_modules-0.15.4_75.tgz
-P ${HOME}/.dlami/

cd ${HOME}/.dlami/

sudo mkdir -p ${HOME}/.dlami/habanalabs/pytorch_temp

sudo tar -xf pytorch_modules-0.15.4_75.tgz -C ${HOME}/.dlami/habanalabs/pytorch_temp/.

sudo ${PYTHON} -m pip uninstall -r ${HOME}/.dlami/habanalabs/pytorch_temp/requirements-pytorch.txt -y

sudo rm -rf ${HOME}/.dlami/habanalabs/pytorch_temp/

sudo rm -rf ${HOME}/.dlami/pytorch_modules-0.15.4_75.tgz
  1. Install the lastest PyTorch Package:

sudo wget
https://vault.habana.ai/gaudi-pt-modules/1.1.1/94/amzn2/binary/pytorch_modules-v1.9.1_1.1.1_94.tgz
-P ${HOME}/.dlami/

cd ${HOME}/.dlami/

sudo tar -xvf pytorch_modules-v1.9.1_1.1.1_94.tgz -C ${HOME}/.dlami/habanalabs

cd habanalabs

sudo $PYTHON -m pip install -r requirements-pytorch.txt

sudo $PYTHON -m pip uninstall torch -y

sudo $PYTHON -m pip install torch*.whl

sudo $PYTHON -m pip install habana_torch*.whl

sudo $PYTHON -m pip install habana_torch_dataloader*.whl

sudo $PYTHON -m pip install habana_dataloader*.whl

sudo $PYTHON -m pip install transformers*.whl

sudo $PYTHON -m pip install fairseq*.whl

sudo $PYTHON -m pip install pytorch_lightning*.whl

sudo $PYTHON -m pip uninstall pillow

sudo $PYTHON -m pip install pillow-simd==7.0.0.post3

LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/habanalabs:/usr/local/openmpi/lib

3.5. Running an Updated Docker Usage

In this case, start with a base AMI or DLAMI to update to the latest Deep Learning Container, or Habana TensorFlow, or PyTorch Docker image from the Habana vault.

The general steps are as follows and are extracted from the steps above:

  1. Unlock the existing packages (for Ubuntu or Amazon Linux 2). Follow the steps listed above.

  2. Update the packages (for Ubuntu or Amazon Linux 2). Only the Firmware and Driver need to be updated in the base Image, while the remaining Synapse SW components will be included in the docker image.

3.6. Verification

  • Check that all the package versions are 1.1.1-94 outside the docker:

apt list --installed | grep habana
  • Check the docker images used for docker run command are 1.1.1-94:

docker images