4. Installation Guide

4.1. Overview

This document describes how to obtain and install the SynapseAI® software package, TensorFlow and the Pytorch software package for the Habana® Gaudi® HPU.

For additional install and setup details, refer to the Setup and Install GitHub page.

4.1.1. Release Details

This release was tested and validated on the following configurations.

Distro

Version

Kernels

CPU Type

Ubuntu

18.04

4.15 and above

Intel x86_64

Ubuntu

20.04

5.4.0 and above

Intel x86_64

Amazon

Linux2

5.4.0 and above

Intel x86_64

RHEL8

8.3

4.18.0

Intel x86_64

Centos

8.3

4.18.0

Intel x86_64

4.1.2. Release Versions

Components

Version

Build number

1.2.0-585

4.1.3. Package Content

The installation contains the following Installers:

  • habanalabs-graph-_all – installs the Graph Compiler and the run-time.

  • habanalabs-thunk-_all – installs the thunk library.

  • habanalabs-dkms_all – installs the PCIe driver.

  • habanalabs-firmware - installs the Gaudi Firmware.

  • habanalabs-firmware-tools – installs various Firmware tools (hlml, hl-smi, etc).

  • habanalabs-qual – installs the qualification application package. See Qualification Library.

  • habanalabs-aeon – installs demo’s data loader.

  • habanalabs-container-runtime - installs the container runtime library.

Refer to Update your Software to obtain the latest installers according to the supported Operating Systems.

4.2. Ubuntu - Package Installation

Installing the package with internet connection available allows the network to download and install the required dependencies for the SynapseAI package (apt get and pip install etc.).

Note

Running the below commands installs the latest version only (see Release Versions). You can install a version other than latest by running the below commands with a specific build number.

4.2.1. Package Retrieval

  1. Download and install the public key:

curl -X GET https://vault.habana.ai/artifactory/api/gpg/key/public | sudo apt-key add --
  1. Get the name of the operating system:

lsb_release -c | awk '{print $2}'
  1. Create an apt source file /etc/apt/sources.list.d/artifactory.list with deb https://vault.habana.ai/artifactory/debian <OS name from previous step> main content.

  2. Update Debian cache:

sudo dpkg --configure -a

sudo apt-get update

4.2.1.1. KMD Dependencies

  1. Install Deb libraries

sudo apt install dkms  libelf-dev
  1. Install headers:

sudo apt install linux-headers-$(uname -r)
  1. After kernel upgrade, reboot your machine.

4.2.2. Firmware Installation

Install the Firmware:

sudo apt install -y habanalabs-firmware

4.2.3. Driver Installation

The habanalabs-dkms_all package installs both the habanalabs and habanalabs_en (Ethernet) drivers. If automation scripts are used, the scripts must be modified to load/unload both drivers.

On kernels 5.12 and later, you can load/unload the two drivers in no specific order. On kernels below 5.12, the habanalabs_en driver must be loaded before the habanalabs driver and unloaded after the habanalabs driver.

The below command installs both the habanalabs and habanalabs_en driver:

sudo apt install -y habanalabs-dkms

4.2.4. Thunk Installation

Install the thunk library:

sudo apt install -y habanalabs-thunk

4.2.5. FW Tools Installation

Install Firmware tools:

sudo apt install -y habanalabs-firmware-tools

4.2.6. Graph Compiler and Run-time Installation

Install the graph compiler and run-time:

sudo apt install -y habanalabs-graph

4.2.7. (Optional) Qual Installation

  1. Install aeon:

sudo apt install -y habanalabs-aeon
  1. Install hl_qual:

sudo apt install -y habanalabs-qual

For further details, see Gaudi Qualification Library.

4.2.8. Container Runtime Installation

  • Install container runtime:

sudo apt install -y habanalabs-container-runtime

Note

As for Kubernetes, 1.20 support for docker has been deprecated.

4.2.8.1. Docker Engine Setup

  1. Register Habana runtime by adding the following to /etc/docker/daemon.json:

sudo tee /etc/docker/daemon.json <<EOF
{
    "runtimes": {
        "habana": {
            "path": "/usr/bin/habana-container-runtime",
            "runtimeArgs": []
        }
    }
}
EOF

Note

You can optionally reconfigure the default runtime by adding the following to /etc/docker/daemon.json:

"default-runtime": "habana"
  1. Restart Docker:

sudo systemctl restart docker

4.2.8.2. Containerd Setup

  1. Register Habana runtime:

sudo tee /etc/containerd/config.toml <<EOF
disabled_plugins = []
version = 2

 [plugins]
   [plugins."io.containerd.grpc.v1.cri"]
     [plugins."io.containerd.grpc.v1.cri".containerd]
       default_runtime_name = "habana"
       [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
         [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.habana]
           runtime_type = "io.containerd.runc.v2"
           [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.habana.options]
             BinaryName = "/usr/bin/habana-container-runtime"
   [plugins."io.containerd.runtime.v1.linux"]
     runtime = "habana-container-runtime"
EOF
  1. Restart containerd:

sudo systemctl restart containerd

4.2.9. Update Environment Variables and More

When the installation is complete, close the shell and re-open it. Or, run the following:

source /etc/profile.d/habanalabs.sh

source ~/.bashrc

4.3. Centos, Amazon and RHEL8 - Package Installation

Installing the package with internet connection available allows the network to download and install the required dependencies for the SynapseAI package (yum install and pip install etc.).

Note

Running the below commands installs the latest version only (see Release Versions). You can install a version other than latest by running the below commands with a specific build number.

4.3.1. Amazon Package Retrieval

  1. Create /etc/yum.repos.d/Habana-Vault.repo with the following content:

[vault]

name=Habana Vault

baseurl=https://vault.habana.ai/artifactory/AmazonLinux2

enabled=1

gpgcheck=0

gpgkey=https://vault.habana.ai/artifactory/AmazonLinux2/repodata/repomod.xml.key

repo_gpgcheck=0
  1. Update YUM cache by running the following command:

sudo yum makecache
  1. Verify correct binding by running the following command:

yum search habana

This will search for and list all packages with the word Habana.

4.3.2. RHEL8 Package Retrieval

  1. Create /etc/yum.repos.d/Habana-Vault.repo with the following content:

[vault]

name=Habana Vault

baseurl=https://<access token>@vault.habana.ai/artifactory/rhel/8/8.3

enabled=1

gpgcheck=1

gpgkey=https://<access token>@vault.habana.ai/artifactory/rhel/8/8.3/repodata/repomd.xml.key

repo_gpgcheck=0
  1. Update YUM cache by running the following command:

sudo yum makecache
  1. Verify correct binding by running the following command:

yum search habana

This will search for and list all packages with the word Habana.

  1. Reinstall libarchive package by following command:

sudo dnf install -y libarchive*

4.3.3. Centos 7 Package Retrieval

  1. Create /etc/yum.repos.d/Habana-Vault.repo with the following content:

[vault]

name=Habana Vault

baseurl=https://vault.habana.ai/artifactory/centos/7/7.5

enabled=1

gpgcheck=0

gpgkey=https://vault.habana.ai/artifactory/centos/7.5/repodata/repomod.xml.key

repo_gpgcheck=0
  1. Update YUM cache:

sudo yum makecache
  1. Verify correct binding:

yum search habana

This will search for and list all packages with the word Habana.

4.3.4. Centos 8 Package Retrieval

  1. Create /etc/yum.repos.d/Habana-Vault.repo with the following content:

[vault]

name=Habana Vault

baseurl=https://vault.habana.ai/artifactory/centos/8

enabled=1

gpgcheck=0

gpgkey=https://vault.habana.ai/artifactory/centos/8/8.3/repodata/repomod.xml.key

repo_gpgcheck=0
  1. Update YUM cache:

sudo yum makecache
  1. Verify correct binding:

yum search habana

This will search for and list all packages with the word Habana.

4.3.4.1. KMD Dependencies

  1. Check your Linux kernel version:

uname -r
  1. Install headers:

sudo yum install kernel-devel
  1. After kernel upgrade, reboot your machine.

4.3.4.2. Additional Dependencies

Add yum-utils:

sudo yum install -y yum-utils

4.3.5. Firmware Installation

Install the Firmware:

sudo yum install -y habanalabs-firmware

4.3.6. Driver Installation

The habanalabs-dkms_all package installs both the habanalabs and habanalabs_en (Ethernet) drivers. If automation scripts are used, the scripts must be modified to load/unload both drivers.

On kernels 5.12 and later, you can load/unload the two drivers in no specific order. On kernels below 5.12, the habanalabs_en driver must be loaded before the habanalabs driver and unloaded after the habanalabs driver.

The below commands installs/uninstalls both the habanalabs and habanalabs_en driver.

  1. Remove the previous driver package:

sudo yum remove habanalabs*
  1. Install the driver:

sudo yum install -y habanalabs

4.3.7. Thunk Installation

Install the thunk library:

sudo yum install -y habanalabs-thunk

4.3.8. FW Tool Installation

Install Firmware tools:

sudo yum install -y habanalabs-firmware-tools

4.3.9. Graph Compiler and Run-time Installation

Install the graph compiler and run-time:

sudo yum install -y habanalabs-graph

4.3.10. (Optional) Qual Installation

  1. Install aeon:

sudo yum install -y habanalabs-aeon
  1. Install hl_qual:

sudo yum install -y habanalabs-qual

For further details, see Qualification Library.

4.3.11. Container Runtime Installation

  1. Install container runtime:

sudo yum install -y habanalabs-container-runtime

Docker Engine setup

Register Habana runtime by adding the following to /etc/docker/daemon.json:

sudo tee /etc/docker/daemon.json <<EOF
{
    "runtimes": {
        "habana": {
            "path": "/usr/bin/habana-container-runtime",
            "runtimeArgs": []
        }
    }
}
EOF

Note

You can optionally reconfigure the default runtime by adding the following to /etc/docker/daemon.json:

"default-runtime": "habana"

Restart Docker:

sudo systemctl restart docker

ContainerD setup

Note

As of Kubernetes 1.20 support for docker has been deprecated

Register Habana runtime:

sudo tee /etc/containerd/config.toml <<EOF
disabled_plugins = []
version = 2

 [plugins]
   [plugins."io.containerd.grpc.v1.cri"]
     [plugins."io.containerd.grpc.v1.cri".containerd]
       default_runtime_name = "habana"
       [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
         [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.habana]
           runtime_type = "io.containerd.runc.v2"
           [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.habana.options]
             BinaryName = "/usr/bin/habana-container-runtime"
   [plugins."io.containerd.runtime.v1.linux"]
     runtime = "habana-container-runtime"
EOF

Restart containerd:

sudo systemctl restart containerd

4.3.12. Update Environment Variables and More

When the installation is complete, close the shell and re-open it. Or, run the following:

source /etc/profile.d/habanalabs.sh

source ~/.bashrc

4.4. TensorFlow Installation

This section describes how to obtain and install the TensorFlow software package. Follow these instructions if you want to install the TensorFlow packages on a Bare Metal platform without a Docker image. The package consists of two main components:

  • Base habana-tensorflow Python package - Libraries and modules needed to execute TensorFlow on a single Gaudi device.

  • Scale-out habana-horovod Python package - Libraries and modules needed to execute TensorFlow on a single-node machine.

You can install TensorFlow on the Habana Gaudi device by:

  1. Using a pre-installed docker containing all the necessary dependencies,

  2. or, installing all the required components from scratch.

4.4.1. Installation with Docker

TensorFlow Docker contains both single and scale-out binaries and does not require additional installation steps. Installation with docker is the recommended installation method.

Before setting up the Docker environment, make sure the Firmware and Driver are set up on the host machine as previously outlined in this document. The following lists the prerequisites needed to install with docker:

  • Docker must be installed on the target machine.

  • Minimum Docker CE version required is 18.09.0.

For Docker install and setup details, refer to the Setup and Install GitHub page.

4.4.2. Setting Up the Environment from Scratch

To set up the environment, the SynapseAI software package must be installed first. Manually install the components listed in Package Content before installing the TensorFlow package.

The TensorFlow software package consists of two Python Packages. Installing both packages guarantees the same functionality delivered with TensorFlow Docker:

  • habana-tensorflow

  • habana-horovod

To execute TensorFlow on a single Gaudi device, install the habana-tensorflow package. To execute TensorFlow on a single-node machine, install the habana-horovod package.

Note

The following example script includes instructions from the steps described in Base Installation (Single Node) and Scale-out Installation and can be used for your reference.

The scripts install TF 2.7.0. They use Python3 from /usr/bin/ with the supported version listed in the Support Matrix. Make sure that Python3 is installed there, and if not, update the bash scripts with the appropriate PYTHON=<path>.

4.4.2.1. Base Installation (Single Node)

The habana-tensorflow package contains all the binaries and scripts to run topologies on a single-node.

All the steps listed below use the PYTHON environment variable, which must be set to the appropriate version of Python, according to the versions listed in the Support Matrix.

export PYTHON=/usr/bin/python<VER> # i.e. for U20 it's PYTHON=/usr/bin/python3.8

1. Before installing habana-tensorflow, install supported TensorFlow version. See the Support Matrix. If no TensorFlow package is available, PIP will automatically fetch it.

$PYTHON -m pip install --user tensorflow-cpu==<supported_tf_version>

Note

Support for S3 file system (among others) has been moved to tensorflow-io. It needs to be installed separately and it has to be installed without its dependencies as it contains a broken dependency on TF that will cause PIP to install non-cpu tensorflow package. Additionally, for RPM-based OSes (AmazonLinux2) TFIO_DATAPATH needs to be set and ca-certificates.crt is expected to be under /etc/ssl/certs/.

# in case of installing TensorFlow 2.7.0
${PYTHON} -m pip install --user --no-deps tensorflow-io==0.22.0 tensorflow-io-gcs-filesystem==0.22.0
# For RPM-based OSes (AmazonLinux2) TFIO_DATAPATH have to be specified to import tensorflow_io lib correctly
export TFIO_DATAPATH=`${PYTHON} -c 'import tensorflow_io as tfio; import os; print(os.path.dirname(os.path.dirname(tfio.__file__)))'`/
# For RPM-based OSes (AmazonLinux2) ca-cert file is expected exactly under /etc/ssl/certs/ca-certificates.crt
# otherwise curl will fail during access to S3 AWS storage
sudo ln -s /etc/ssl/certs/ca-bundle.crt /etc/ssl/certs/ca-certificates.crt

2. habana-tensorflow is available in the Habana Vault. To allow PIP to search for the habana-tensorflow package, –extra-index-url needs to be specified:

$PYTHON -m pip install --user habana-tensorflow==1.2.0-585 --extra-index-url https://vault.habana.ai/artifactory/api/pypi/gaudi-python/simple

Note

URLs to Habana Vault require credentials.

  1. Run the below command to make sure the habana-tensorflow package is properly installed:

$PYTHON -c "import habana_frameworks.tensorflow as htf; print(htf.__version__)"

If everything is set up properly, the above command will print the currently installed package version.

Note

habana-tensorflow contains libraries for all supported TensorFlow versions. It is delivered under manylinux2010 tag (same as TensorFlow).

4.4.2.2. Scale-out Installation

Install the habana-horovod package to get multi-node support. The following lists the prerequisites for installing this package:

  • OpenMPI 4.0.5.

  • Stock horovod package must not be installed.

  1. Install packages required to compile OpenMPI and Habana Horovod.

    • For Ubuntu 18:

    sudo apt install -y python3.7-dev
    sudo apt install -y wget
    
    • For AmazonLinux2:

    sudo yum groupinstall -y "Development Tools"
    sudo yum install -y system-lsb-core cmake
    sudo yum install -y wget
    sudo yum install -y python3-devel
    
  2. Set up the OpenMPI 4.0.5 as shown below:

wget https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.5.tar.gz
gunzip -c openmpi-4.0.5.tar.gz | tar xf -
cd openmpi-4.0.5/ && ./configure --prefix=/usr/local/openmpi
make -j 8 && make install && touch ~root/openmpi-4.0.5_installed
cp LICENSE /usr/local/share/openmpi/

# Necessary env flags to install habana-horovod module
export MPI_ROOT=/usr/local/openmpi
export LD_LIBRARY_PATH=$MPI_ROOT/lib:$LD_LIBRARY_PATH
export OPAL_PREFIX=$MPI_ROOT
export PATH=$MPI_ROOT/bin:$PATH
  1. Install mpi4py binding

$PYTHON -m pip install --user mpi4py==3.0.3

4. habana-horovod is also stored in the Habana Vault. To allow PIP to search for the habana-horovod package, –extra-index-url needs to be specified:

$PYTHON -m pip install --user habana-horovod==1.2.0-585 --extra-index-url https://vault.habana.ai/artifactory/api/pypi/gaudi-python/simple

Note

URLs to Habana Vault require credentials.

See also

To learn more about the TensorFlow distributed training on Gaudi, see Distributed Training with TensorFlow.

4.4.2.3. Model References Requirements

Habana provides a number of model references optimized to run on Gaudi. Those models are available at Model-References page.

Many of the references require additional 3rd party packages, not provided by Habana. This section describes how to install the required 3rd party packages.

There are two types of packages required by the model references:

  • System packages - installed with OS packet manager (e.g. apt in case of Ubuntu). There are three required system packages:

    • libjemalloc

    • protobuf-compiler

    • libGL

  • Python packages - installed with pip tools.

    • packages required to run topologies from Model references repository are defined in per-topology requirements.txt files in each folder containing the topologies’ scripts.

4.4.2.3.1. Installing System Packages
  • To install them on Ubuntu 18.04 invoke:

sudo apt install -y libjemalloc1
sudo apt install -y protobuf-compiler
sudo apt install -y libgl1

Note

An example script for Ubuntu18 installing those OS packages is available for your reference: u18_tensorflow_models_dependencies_installation.sh

  • To install them on AmazonLinux2 invoke:

sudo yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
sudo yum install -y jemalloc
sudo yum install -y mesa-libGL
wget https://github.com/protocolbuffers/protobuf/releases/download/v3.6.1/protoc-3.6.1-linux-x86_64.zip
sudo unzip protoc-3.6.1-linux-x86_64.zip -d /usr/local/protoc
rm -rf protoc-3.6.1-linux-x86_64.zip

Note

An example script for AmazonLinux2 installing those OS packages is available for your reference: al2_tensorflow_models_dependencies_installation.sh

4.5. PyTorch Installation

This section describes how to obtain and install the PyTorch software package. Follow the instructions outlined below to install PyTorch packages on a bare metal platform or virtual machine without a Docker image.

Habana PyTorch packages consist of:

  • torch - PyTorch framework package with Habana support

  • habana-torch - Libraries and modules needed to execute PyTorch on single card, single node and multi node setup.

  • habana-dataloader - PyTorch Habana custom dataloader for ImageNet dataset.

  • habana-torch-dataloader - Habana multi-threaded dataloader package.

  • fairseq - Fairseq package updated with Habana support, required for Transformer model.

  • transformers - Huggingface transformer package updated with Habana support, required for BERT Finetuning - SQuAD & MRPC.

  • pytorch-lightning - PyTorch Lightning package with Habana support, required for Unet2D.

You can use PyTorch on the Habana Gaudi device by:

  • Using a pre-installed docker containing all the necessary packages,

  • or, installing all the required components from scratch.

4.5.1. Installation with Docker

PyTorch docker image is pre-installed with all the necessary binaries and packages required to use a single card, single node and multi node machines. Additional packages are not required for installation.

PyTorch docker contains:

  • Synapse AI package

  • Habana PyTorch packages

Before setting up the Docker environment, make sure Firmware and Driver are set up on the host machine as previously outlined in this document. The following lists the prerequisites needed to install with docker:

  • Docker must be installed on the target machine.

  • Minimum Docker CE version required is 18.09.0.

For Docker install and setup details, refer to the Setup and Install GitHub page.

4.5.2. Setting Up the Environment from Scratch

  1. To set up the environment, the SynapseAI software package must be installed first. Manually install the components listed in Package Content before installing the PyTorch package.

  2. To setup Habana PyTorch environment, download and execute bash script pytorch_installation.sh.

    By installing the PyTorch environment, the below will be performed troughout the execution:

    • Autodetect OS type and the supported python version for which Habana PyTorch wheel packages are present in Vault.

    • Try to autodetect Habana software version and build number.

    • Install OS specific dependent deb/rpm packages.

    • Download and install OpenMPI and mpi4py package.

    • Download tar ball file that will have the PyTorch specific packages from the Habana Vault.

    • Install requirements-pytorch.txt which exists inside tar ball.

    • Uninstall torch, as it will be installed by deafult while installing requirements-pytorch.txt.

    • Install Habana PyTorch python packages.

    • Uninstall pillow package and install pillow-simd.

    • Add the required environment variables in /etc/profile.d/habanalabs.sh and source /etc/profile.d/habanalabs.sh in ~/.bashrc.

Note

Refer to the Support Matrix to view the supported python version for each of the Operating Systems.

4.5.2.1. Command Line Usage

The following are examples of CLI usage when executing PyTorch packages:

  • To autodetect OS type, Habana Software version and build number use ./pytorch_installation.sh.

  • To download and install specific version and build use ./pytorch_installation.sh -v 1.2.0-585.

The Supported Options:

  • -v <software version> - Habana software version eg 1.2.0

  • -b <build/revision> - Habana build number eg: 148 in 1.2.0-148

  • -os <os version> - OS version <ubuntu2004/ubuntu1804/amzn2/rhel79/rhel83/centos83>.

  • -ndep - do not install rpm/deb dependecies.

  • -sys - install python packages without --user.

  • -u - install python packages with --user.

Note

Some PyTorch models need additional python packages. They can be installed using python requirements files provided in Model References repository. Refer to Model References repository for detailed instructions on running PyTorch models.

4.5.3. Test for a Successful Installation

  1. To test the installation ran successfully, use the following command:

lsmod | grep habana

After running the command, a driver called habanalabs should be displayed.

habanalabs 1204224 2

2. To ensure best performance, check CPU mode on the host:

cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
  1. If you see powersave, execute the following:

echo performance | sudo
tee/sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
  1. Then, verify CPU mode:

cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor