3. Installation Guide

3.1. Overview

This document describes how to obtain and install the SynapseAI® software package and the TensorFlow software package for the for the Habana® Gaudi® HPU.

For additional install and setup details, refer to the Setup and Install GitHub page.

3.1.1. Release Details

This release was tested and validated on the following configurations.

Distro

Version

Kernels

CPU Type

Ubuntu

18.04

4.15 and above

Intel x86_64

Ubuntu

20.04

5.4.0 and above

Intel x86_64

Amazon

Linux2

5.4.0 and above

Intel x86_64

Centos

7.8

4.9.184 and above

Intel x86_64

3.1.2. Release Versions

Components

Version

Build number

1.0.1-81

3.1.3. Package Content

The installation contains the following Installers:

  • habanalabs-graph-_all – installs the Graph Compiler and the run-time.

  • habanalabs-thunk-_all – installs the thunk library.

  • habanalabs-dkms_all – installs the PCIe driver.

  • habanalabs-firmware - installs the Gaudi Firmware.

  • habanalabs-firmware-tools – installs various Firmware tools (hlml, hl-smi, etc).

  • habanalabs-qual – installs the qualification application package. See Qualification Library.

  • habanalabs-aeon – installs demo’s data loader.

  • habanalabs-container-runtime - installs the container runtime library.

Refer to Update your Software to obtain the latest installers according to the supported Operating Systems.

3.2. Ubuntu - Package Installation

Installing the package with internet connection available allows the network to download and install the required dependencies for the SynapseAI package (apt get and pip install etc.).

Note

Running the below commands installs the latest version only (see Release Versions). You can install a version other than latest by running the below commands with a specific build number.

3.2.1. Package Retrieval

  1. Download and install the public key:

curl -X GET https://vault.habana.ai/artifactory/api/gpg/key/public | sudo apt-key add --
  1. Get the name of the operating system:

lsb_release -c | awk '{print $2}'
  1. Create an apt source file /etc/apt/sources.list.d/artifactory.list with deb https://vault.habana.ai/artifactory/debian <OS name from previous step> main content.

  2. Update Debian cache:

sudo dpkg --configure -a

sudo apt-get update

3.2.1.1. KMD Dependencies

  1. Install Deb libraries

sudo apt install dkms  libelf-dev
  1. Install headers:

sudo apt install linux-headers-$(uname -r)
  1. After kernel upgrade, reboot your machine.

3.2.2. Firmware Installation

Install the Firmware:

sudo apt install -y habanalabs-firmware

3.2.3. Driver Installation

The habanalabs-dkms_all package installs both the habanalabs and habanalabs_en (Ethernet) drivers. If automation scripts are used, the scripts must be modified to load/unload both drivers.

On kernels 5.12 and later, you can load/unload the two drivers in no specific order. On kernels below 5.12, the habanalabs_en driver must be loaded before the habanalabs driver and unloaded after the habanalabs driver.

The below command installs both the habanalabs and habanalabs_en driver:

sudo apt install -y habanalabs-dkms

3.2.4. Thunk Installation

Install the thunk library:

sudo apt install -y habanalabs-thunk

3.2.5. FW Tools Installation

Install Firmware tools:

sudo apt install -y habanalabs-firmware-tools

3.2.6. Graph Compiler and Run-time Installation

Install the graph compiler and run-time:

sudo apt install -y habanalabs-graph

3.2.7. (Optional) Qual Installation

  1. Install aeon:

sudo apt install -y habanalabs-aeon
  1. Install hl_qual:

sudo apt install -y habanalabs-qual

For further details, see Gaudi Qualification Library.

3.2.8. Container Runtime Installation

  1. Install container runtime:

sudo apt install -y habanalabs-container-runtime
  1. Register habana runtime by adding the following to /etc/docker/daemon.json:

{
        "runtimes": {
            "habana": {
                "path": "/usr/bin/habana-container-runtime",
                "runtimeArgs": []
            }
        }
}
  1. Restart Docker:

sudo systemctl restart docker

3.2.9. Update Environment Variables and More

When the installation is complete, close the shell and re-open it. Or, run the following:

source /etc/profile.d/habanalabs.sh

source ~/.bashrc

3.3. Centos and Amazon - Package Installation

Installing the package with internet connection available allows the network to download and install the required dependencies for the SynapseAI package (yum install and pip install etc.).

Note

Running the below commands installs the latest version only (see Release Versions). You can install a version other than latest by running the below commands with a specific build number.

3.3.1. Amazon Package Retrieval

  1. Create /etc/yum.repos.d/Habana-Vault.repo with the following content:

[vault]

name=Habana Vault

baseurl=https://vault.habana.ai/artifactory/AmazonLinux2

enabled=1

gpgcheck=0

gpgkey=https://vault.habana.ai/artifactory/AmazonLinux2/repodata/repomod.xml.key

repo_gpgcheck=0
  1. Update YUM cache by running the following command:

sudo yum makecache
  1. Verify correct binding by running the following command:

yum search habana

This will search for and list all packages with the word Habana.

3.3.2. Centos Package Retrieval

  1. Create /etc/yum.repos.d/Habana-Vault.repo with the following content:

[vault]

name=Habana Vault

baseurl=https://vault.habana.ai/artifactory/centos7

enabled=1

gpgcheck=0

gpgkey=https://vault.habana.ai/artifactory/centos7/repodata/repomod.xml.key

repo_gpgcheck=0
  1. Update YUM cache:

sudo yum makecache
  1. Verify correct binding:

yum search habana

This will search for and list all packages with the word Habana.

3.3.2.1. KMD Dependencies

  1. Check your Linux kernel version:

uname -r
  1. Install headers:

sudo yum install kernel-devel
  1. After kernel upgrade, reboot your machine.

3.3.2.2. Additional Dependencies

Add yum-utils:

sudo yum install -y yum-utils

3.3.3. Firmware Installation

Install the Firmware:

sudo yum install -y habanalabs-firmware

3.3.4. Driver Installation

The habanalabs-dkms_all package installs both the habanalabs and habanalabs_en (Ethernet) drivers. If automation scripts are used, the scripts must be modified to load/unload both drivers.

On kernels 5.12 and later, you can load/unload the two drivers in no specific order. On kernels below 5.12, the habanalabs_en driver must be loaded before the habanalabs driver and unloaded after the habanalabs driver.

The below commands installs/uninstalls both the habanalabs and habanalabs_en driver.

  1. Remove the previous driver package:

sudo yum remove habanalabs*
  1. Install the driver:

sudo yum install -y habanalabs

3.3.5. Thunk Installation

Install the thunk library:

sudo yum install -y habanalabs-thunk

3.3.6. FW Tool Installation

Install Firmware tools:

sudo yum install -y habanalabs-firmware-tools

3.3.7. Graph Compiler and Run-time Installation

Install the graph compiler and run-time:

sudo yum install -y habanalabs-graph

3.3.8. (Optional) Qual Installation

  1. Install aeon:

sudo yum install -y habanalabs-aeon
  1. Install hl_qual:

sudo yum install -y habanalabs-qual

For further details, see Qualification Library.

3.3.9. Container Runtime Installation

  1. Install container runtime:

sudo yum install -y habanalabs-container-runtime
  1. Register habana runtime by adding the following to /etc/docker/daemon.json:

{
        "runtimes": {
            "habana": {
                "path": "/usr/bin/habana-container-runtime",
                "runtimeArgs": []
            }
        }
}
  1. Restart Docker:

sudo systemctl restart docker

3.3.10. Update Environment Variables and More

When the installation is complete, close the shell and re-open it. Or, run the following:

source /etc/profile.d/habanalabs.sh

source ~/.bashrc

3.4. TensorFlow Installation

This section describes how to obtain and install the TensorFlow software package. Follow these instructions if you want to install the TensorFlow packages on a Bare Metal platform without a Docker image. The package consists of two main components:

  • Base habana-tensorflow Python package - Libraries and modules needed to execute TensorFlow on a single Gaudi device.

  • Scale-out habana-horovod Python package - Libraries and modules needed to execute TensorFlow on a single-node machine.

You can install TensorFlow on the Habana Gaudi device by:

  1. Using a pre-installed docker containing all the necessary dependencies,

  2. or, installing all the required components from scratch.

3.4.1. Installation with Docker

TensorFlow Docker contains both single and scale-out binaries and does not require additional installation steps. Installation with docker is the recommended installation method.

Before setting up the Docker environment, make sure the Firmware and Driver are set up on the host machine as previously outlined in this document. The following lists the prerequisites needed to install with docker:

  • Docker must be installed on the target machine.

  • Minimum Docker CE version required is 18.09.0.

For Docker install and setup details, refer to the Setup and Install GitHub page.

3.4.2. Setting Up the Environment from Scratch

To set up the environment, the SynapseAI software package must be installed first. Manually install the components listed in Package Content before installing the TensorFlow package.

The TensorFlow software package consists of two Python Packages. Installing both packages guarantees the same functionality delivered with TensorFlow Docker:

  • habana-tensorflow

  • habana-horovod

To execute TensorFlow on a single Gaudi device, install the habana-tensorflow package. To execute TensorFlow on an single-node machine, install the habana-horovod package.

3.4.2.1. Base Installation (Single Node)

The habana-tensorflow package contains all the binaries and scripts to run topologies on a single-node.

1. Before installing habana-tensorflow, install supported TensorFlow version. See TensorFlow Release Notes. If no TensorFlow package is available, PIP will automatically fetch it.

python3 -m pip install tensorflow-cpu==<supported_tf_version>

2. habana-tensorflow is available in the Habana Vault. To allow PIP to search for the habana-tensorflow package, –extra-index-url needs to be specified:

python3 -m pip install habana-tensorflow --extra-index-url https://vault.habana.ai/artifactory/api/pypi/gaudi-python/simple

Note

URLs to Habana Vault require credentials.

  1. Run the below command to make sure the habana-tensorflow package is properly installed:

python3 -c "import habana_frameworks.tensorflow as htf; print(htf.__version__)"

If everything is set up properly, the above command will print the currently installed package version.

Note

habana-tensorflow contains libraries for all supported TensorFlow versions. It is delivered under Linux tag, but the package is compatible with manylinux2010 tag (same as TensorFlow).

3.4.2.2. Scale-out Installation

There are two methods of getting multi-node support - Horovod or TensorFlow distributed.

3.4.2.2.1. For Horovod Distributed

Install the habana-horovod package to get multi-node support. The following lists the prerequisites for installing this package:

  • OpenMPI 4.0.5.

  • Stock horovod package must not be installed.

  1. Set up the OpenMPI 4.0.5 as shown below:

wget https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.5.tar.gz
gunzip -c openmpi-4.0.5.tar.gz | tar xf -
cd openmpi-4.0.5/ && ./configure --prefix=/usr/local/openmpi
make -j 8 && make install && touch ~root/openmpi-4.0.5_installed
cp LICENSE /usr/local/openmpi/

# Necessary env flags to install habana-horovod module
export MPI_ROOT=/usr/local/openmpi
export LD_LIBRARY_PATH=$MPI_ROOT/lib:$LD_LIBRARY_PATH
export OPAL_PREFIX=$MPI_ROOT
export PATH=$MPI_ROOT/bin:$PATH

2. habana-horovod is also stored in the Habana Vault. To allow PIP to search for the habana-horovod package, –extra-index-url needs to be specified:

python3 -m pip install habana-horovod --extra-index-url https://vault.habana.ai/artifactory/api/pypi/gaudi-python/simple

Note

URLs to Habana Vault require credentials.

3.4.2.2.2. For TensorFlow Distributed

To get scale-out capabilities on TensorFlow distributed, no additional packages other than habana-tensorflow package needs to be installed. Unlike Horovod, neither tf.distribute nor HPUStrategy use/require OpenMPI at any point. Worker processes can be initialized in any way. Refer to Model References repository for an example using mpirun, as it offers process-to-core binding mechanism. Installing OpenMPI as described above in For Horovod Distributed is recommended.

See also

To learn more about the TensorFlow distributed training on Gaudi, see Distributed Training with TensorFlow.

3.5. PyTorch Installation

This section describes PyTorch docker container. PyTorch docker image is pre-installed with all the necessary binaries required to use a single Gaudi device and single-node machine.

PyTorch docker contains:

  • Synapse AI package

  • Base Habana Pytorch package – libraries and modules needed to execute PyTorch on a single Gaudi device

  • Distributed Habana Pytorch package - libraries and modules needed to execute PyTorch on an single-node machine

For known issues and limitations, refer to the PyTorch section of the Release Notes.

3.5.1. Installation with Docker

PyTorch Docker contains both single and distributed binaries and does not require additional installation steps. Installation with docker is the recommended installation method.

Before setting up the Docker environment, make sure Firmware and Driver are set up on the host machine as previously outlined in this document. The following lists the prerequisites needed to install with docker:

  • Docker must be installed on the target machine.

  • Minimum Docker CE version required is 18.09.0.

For Docker install and setup details, refer to the Setup and Install GitHub page.

3.5.2. Test for a Successful Installation

  1. To test the installation ran successfully, use the following command:

lsmod | grep habana

After running the command, a driver called habanalabs should be displayed.

habanalabs 1204224 2

2. To ensure best performance, check CPU mode on the host:

cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
  1. If you see powersave, execute the following:

echo performance | sudo
tee/sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
  1. Then, verify CPU mode:

cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor