AWS Base OS AMI Installation¶
The following table outlines the steps required when using a standard (non-DL) AMI image to set up the EC2 instance.
Objective |
Steps |
|||
---|---|---|---|---|
Run Framework on Bare Metal Fresh OS (TensorFlow/PyTorch) |
||||
Run Using Containers on Bare Metal Fresh OS |
Set Up SynapseAI SW Stack¶
Installing the package with internet connection available allows the network to download and install the required dependencies for the SynapseAI package (apt get, yum install or pip install etc.). The installation contains the following Installers:
habanalabs-graph – installs the Graph Compiler and the run-time.
habanalabs-thunk – installs the thunk library.
habanalabs-dkms – installs the PCIe driver.
habanalabs-firmware - installs the Gaudi Firmware.
habanalabs-firmware-tools – installs various Firmware tools (hlml, hl-smi, etc).
habanalabs-qual – installs the qualification application package. See Qualification Library.
habanalabs-container-runtime - installs the container runtime library.
Note
Running the below commands installs the latest version only. You can install a version other than latest by running the below commands with a specific build number.
Package Retrieval:
Download and install the public key:
curl -X GET https://vault.habana.ai/artifactory/api/gpg/key/public | sudo apt-key add --
Get the name of the operating system:
lsb_release -c | awk '{print $2}'
Create an apt source file /etc/apt/sources.list.d/artifactory.list with deb https://vault.habana.ai/artifactory/debian <OS name from previous step> main content.
Update Debian cache:
sudo dpkg --configure -a
sudo apt-get update
KMD Dependencies:
Install Deb libraries
sudo apt install dkms libelf-dev
Install headers:
sudo apt install linux-headers-$(uname -r)
After kernel upgrade, reboot your machine.
Firmware Installation:
Install the Firmware:
sudo apt install -y habanalabs-firmware
Driver Installation:
The habanalabs-dkms_all package installs both the habanalabs and habanalabs_en (Ethernet) drivers. If automation scripts are used, the scripts must be modified to load/unload both drivers.
On kernels 5.12 and later, you can load/unload the two drivers in no specific order. On kernels below 5.12, the habanalabs_en driver must be loaded before the habanalabs driver and unloaded after the habanalabs driver.
Run the below command to install both the habanalabs and habanalabs_en driver:
sudo apt install -y habanalabs-dkms
Load the habanalabs_en driver first and the habanalabs driver after:
sudo modprobe <driver name>
Thunk Installation:
Install the thunk library:
sudo apt install -y habanalabs-thunk
FW Tools Installation:
Install Firmware tools:
sudo apt install -y habanalabs-firmware-tools
Graph Compiler and Run-time Installation:
Install the graph compiler and run-time:
sudo apt install -y habanalabs-graph
(Optional) Qual Installation:
Install hl_qual:
sudo apt install -y habanalabs-qual
For further details, see Gaudi Qualification Library.
Container Runtime Installation:
Install container runtime:
sudo apt install -y habanalabs-container-runtime
Update Environment Variables and More
When the installation is complete, close the shell and re-open it. Or, run the following:
source /etc/profile.d/habanalabs.sh
source ~/.bashrc
KMD Dependencies:
Install Deb libraries
sudo apt install dkms libelf-dev
Install headers:
sudo apt install linux-headers-$(uname -r)
After kernel upgrade, reboot your machine.
Firmware Installation:
Install the Firmware:
sudo apt install -y habanalabs-firmware
Driver Installation:
The habanalabs-dkms_all package installs both the habanalabs and habanalabs_en (Ethernet) drivers. If automation scripts are used, the scripts must be modified to load/unload both drivers.
On kernels 5.12 and later, you can load/unload the two drivers in no specific order. On kernels below 5.12, the habanalabs_en driver must be loaded before the habanalabs driver and unloaded after the habanalabs driver.
Run the below command to install both the habanalabs and habanalabs_en driver:
sudo apt install -y habanalabs-dkms
Load the habanalabs_en driver first and the habanalabs driver after:
sudo modprobe <driver name>
Thunk Installation:
Install the thunk library:
sudo apt install -y habanalabs-thunk
FW Tools Installation:
Install Firmware tools:
sudo apt install -y habanalabs-firmware-tools
Graph Compiler and Run-time Installation:
Install the graph compiler and run-time:
sudo apt install -y habanalabs-graph
(Optional) Qual Installation:
Install hl_qual:
sudo apt install -y habanalabs-qual
For further details, see Gaudi Qualification Library.
Container Runtime Installation:
Install container runtime:
sudo apt install -y habanalabs-container-runtime
Update Environment Variables and More
When the installation is complete, close the shell and re-open it. Or, run the following:
source /etc/profile.d/habanalabs.sh
source ~/.bashrc
Package Retrieval:
Create /etc/yum.repos.d/Habana-Vault.repo with the following content:
[vault]
name=Habana Vault
baseurl=https://vault.habana.ai/artifactory/AmazonLinux2
enabled=1
gpgcheck=0
gpgkey=https://vault.habana.ai/artifactory/AmazonLinux2/repodata/repomod.xml.key
repo_gpgcheck=0
Update YUM cache by running the following command:
sudo yum makecache
Verify correct binding by running the following command:
yum search habana
This will search for and list all packages with the word Habana.
KMD Dependencies:
Check your Linux kernel version:
uname -r
Install headers:
sudo yum install kernel-devel
After kernel upgrade, reboot your machine.
Additional Dependencies:
Add yum-utils:
sudo yum install -y yum-utils
Firmware Installation:
Install the Firmware:
sudo yum install -y habanalabs-firmware
Driver Installation:
The habanalabs-dkms_all package installs both the habanalabs and habanalabs_en (Ethernet) drivers. If automation scripts are used, the scripts must be modified to load/unload both drivers.
On kernels 5.12 and later, you can load/unload the two drivers in no specific order. On kernels below 5.12, the habanalabs_en driver must be loaded before the habanalabs driver and unloaded after the habanalabs driver.
The below commands installs/uninstalls both the habanalabs and habanalabs_en driver.
(Recommended) Remove the previous driver package:
sudo yum remove habanalabs*
Install the driver:
sudo yum install -y habanalabs
Load the habanalabs_en driver first and the habanalabs driver after:
sudo modprobe <driver name>
Thunk Installation
Install the thunk library:
sudo yum install -y habanalabs-thunk
FW Tool Installation:
Install Firmware tools:
sudo yum install -y habanalabs-firmware-tools
Graph Compiler and Run-time Installation:
Install the graph compiler and run-time:
sudo yum install -y habanalabs-graph
(Optional) Qual Installation:
Install hl_qual:
sudo yum install -y habanalabs-qual
For further details, see Qualification Library.
Container Runtime Installation:
Install container runtime:
sudo yum install -y habanalabs-container-runtime
Update Environment Variables and More
When the installation is complete, close the shell and re-open it. Or, run the following:
source /etc/profile.d/habanalabs.sh
source ~/.bashrc
Package Retrieval:
Create /etc/yum.repos.d/Habana-Vault.repo with the following content:
[vault]
name=Habana Vault
baseurl=https://<access token>@vault.habana.ai/artifactory/rhel/8/8.3
enabled=1
gpgcheck=1
gpgkey=https://<access token>@vault.habana.ai/artifactory/rhel/8/8.3/repodata/repomd.xml.key
repo_gpgcheck=0
Update YUM cache by running the following command:
sudo yum makecache
Verify correct binding by running the following command:
yum search habana
This will search for and list all packages with the word Habana.
Reinstall libarchive package by following command:
sudo dnf install -y libarchive*
KMD Dependencies:
Check your Linux kernel version:
uname -r
Install headers:
sudo yum install kernel-devel
After kernel upgrade, reboot your machine.
Additional Dependencies:
Add yum-utils:
sudo yum install -y yum-utils
Firmware Installation:
Install the Firmware:
sudo yum install -y habanalabs-firmware
Driver Installation:
The habanalabs-dkms_all package installs both the habanalabs and habanalabs_en (Ethernet) drivers. If automation scripts are used, the scripts must be modified to load/unload both drivers.
On kernels 5.12 and later, you can load/unload the two drivers in no specific order. On kernels below 5.12, the habanalabs_en driver must be loaded before the habanalabs driver and unloaded after the habanalabs driver.
The below commands installs/uninstalls both the habanalabs and habanalabs_en driver.
(Recommended) Remove the previous driver package:
sudo yum remove habanalabs*
Install the driver:
sudo yum install -y habanalabs
Load the habanalabs_en driver first and the habanalabs driver after:
sudo modprobe <driver name>
Thunk Installation
Install the thunk library:
sudo yum install -y habanalabs-thunk
FW Tool Installation:
Install Firmware tools:
sudo yum install -y habanalabs-firmware-tools
Graph Compiler and Run-time Installation:
Install the graph compiler and run-time:
sudo yum install -y habanalabs-graph
(Optional) Qual Installation:
Install hl_qual:
sudo yum install -y habanalabs-qual
For further details, see Qualification Library.
Container Runtime Installation:
Install container runtime:
sudo yum install -y habanalabs-container-runtime
Update Environment Variables and More
When the installation is complete, close the shell and re-open it. Or, run the following:
source /etc/profile.d/habanalabs.sh
source ~/.bashrc
Package Retrieval:
Create /etc/yum.repos.d/Habana-Vault.repo with the following content:
[vault]
name=Habana Vault
baseurl=https://vault.habana.ai/artifactory/centos/8/8.3
enabled=1
gpgcheck=0
gpgkey=https://vault.habana.ai/artifactory/centos/8/8.3/repodata/repomod.xml.key
repo_gpgcheck=0
Update YUM cache:
sudo yum makecache
Verify correct binding:
yum search habana
This will search for and list all packages with the word Habana.
KMD Dependencies:
Check your Linux kernel version:
uname -r
Install headers:
sudo yum install kernel-devel
After kernel upgrade, reboot your machine.
Additional Dependencies:
Add yum-utils:
sudo yum install -y yum-utils
Firmware Installation:
Install the Firmware:
sudo yum install -y habanalabs-firmware
Driver Installation:
The habanalabs-dkms_all package installs both the habanalabs and habanalabs_en (Ethernet) drivers. If automation scripts are used, the scripts must be modified to load/unload both drivers.
On kernels 5.12 and later, you can load/unload the two drivers in no specific order. On kernels below 5.12, the habanalabs_en driver must be loaded before the habanalabs driver and unloaded after the habanalabs driver.
The below commands installs/uninstalls both the habanalabs and habanalabs_en driver.
(Recommended) Remove the previous driver package:
sudo yum remove habanalabs*
Install the driver:
sudo yum install -y habanalabs
Load the habanalabs_en driver first and the habanalabs driver after:
sudo modprobe <driver name>
Thunk Installation
Install the thunk library:
sudo yum install -y habanalabs-thunk
FW Tool Installation:
Install Firmware tools:
sudo yum install -y habanalabs-firmware-tools
Graph Compiler and Run-time Installation:
Install the graph compiler and run-time:
sudo yum install -y habanalabs-graph
(Optional) Qual Installation:
Install hl_qual:
sudo yum install -y habanalabs-qual
For further details, see Qualification Library.
Container Runtime Installation:
Install container runtime:
sudo yum install -y habanalabs-container-runtime
Update Environment Variables and More
When the installation is complete, close the shell and re-open it. Or, run the following:
source /etc/profile.d/habanalabs.sh
source ~/.bashrc
Set Number of Huge Pages¶
Some training models use huge pages. It is recommended to set the number of huge pages as provided below:
#set current hugepages
sudo sysctl -w vm.nr_hugepages=15000
#Remove old entry if exists in sysctl.conf
sudo sed --in-place '/nr_hugepages/d' /etc/sysctl.conf
#Insert huge pages settings to persist
echo "vm.nr_hugepages=15000" | sudo tee -a /etc/sysctl.conf
Bring up Network Interfaces¶
If training using Gaudi network interfaces for multi-node scaleout (external Gaudi network interfaces between servers), please ensure the network interfaces are brought up. These interfaces need to be brought up every time the kernel module is loaded or unloaded and reloaded.
Note
This section is not relevant for AWS users.
A reference on how to bring up the interfaces is provided in the manage_network_ifs.sh script as detailed in Gaudi Utils.
Use the following commands:
# manage_network_ifs.sh requires ethtool
sudo apt-get install ethtool
./manage_network_ifs.sh --up
Habana Driver Unattended Upgrade¶
Unattended upgrade automatically installs the latest Habana drivers (habanalabs and habanalabs_en).
Note
Unattended upgrade is supported starting from v1.3.0 and above only.
Install unattended upgrade:
sudo apt install --only-upgrade habanalabs-dkms
After running unattended upgrade, you must load/unload the drivers or restart your machine. The habanalabs_en driver must be loaded before the habanalabs driver and unloaded after the habanalabs driver.
Unload the habanalabs driver first and the habanalabs_en driver after:
sudo modprobe -r <driver name>
Load the habanalabs_en driver first and the habanalabs driver after:
sudo modprobe <driver name>
Unattended upgrade automatically installs the latest Habana drivers (habanalabs and habanalabs_en).
Note
Unattended upgrade is supported starting from v1.3.0 and above only.
Install unattended upgrade:
sudo apt install --only-upgrade habanalabs-dkms
After running unattended upgrade, you must load/unload the drivers or restart your machine. The habanalabs_en driver must be loaded before the habanalabs driver and unloaded after the habanalabs driver.
Unload the habanalabs driver first and the habanalabs_en driver after:
sudo modprobe -r <driver name>
Load the habanalabs_en driver first and the habanalabs driver after:
sudo modprobe <driver name>
Unattended upgrade automatically installs the latest Habana drivers (habanalabs and habanalabs_en).
Note
Unattended upgrade is supported starting from v1.3.0 and above only.
Install unattended upgrade:
sudo yum update habanalabs
After running unattended upgrade, you must load/unload the drivers or restart your machine. The habanalabs_en driver must be loaded before the habanalabs driver and unloaded after the habanalabs driver.
Unload the habanalabs driver first and the habanalabs_en driver after:
sudo modprobe -r <driver name>
Load the habanalabs_en driver first and the habanalabs driver after:
sudo modprobe <driver name>
Unattended upgrade automatically installs the latest Habana drivers (habanalabs and habanalabs_en).
Note
Unattended upgrade is supported starting from v1.3.0 and above only.
Install unattended upgrade:
sudo yum update habanalabs
After running unattended upgrade, you must load/unload the drivers or restart your machine. The habanalabs_en driver must be loaded before the habanalabs driver and unloaded after the habanalabs driver.
Unload the habanalabs driver first and the habanalabs_en driver after:
sudo modprobe -r <driver name>
Load the habanalabs_en driver first and the habanalabs driver after:
sudo modprobe <driver name>
Unattended upgrade automatically installs the latest Habana drivers (habanalabs and habanalabs_en).
Note
Unattended upgrade is supported starting from v1.3.0 and above only.
Install unattended upgrade:
sudo yum update habanalabs
After running unattended upgrade, you must load/unload the drivers or restart your machine. The habanalabs_en driver must be loaded before the habanalabs driver and unloaded after the habanalabs driver.
Unload the habanalabs driver first and the habanalabs_en driver after:
sudo modprobe -r <driver name>
Load the habanalabs_en driver first and the habanalabs driver after:
sudo modprobe <driver name>
Install Native Frameworks¶
Installing frameworks with docker is the recommended installation method and does not require additional steps.
TensorFlow Installation¶
This section describes how to obtain and install the TensorFlow software package. Follow these instructions if you want to install the TensorFlow packages on a Bare Metal platform without a Docker image. The package consists of two main components:
Base habana-tensorflow Python package - Libraries and modules needed to execute TensorFlow on a single Gaudi device.
Scale-out habana-horovod Python package - Libraries and modules needed to execute TensorFlow on a single-node machine.
Setting Up the Environment¶
Habana TensorFlow support package consists of two Python Packages. Installing both packages guarantees the same functionality delivered with TensorFlow Docker:
habana-tensorflow
- execute TensorFlow on a single Gaudi devicehabana-horovod
- execute TensorFlow on a single-node machine
To set up the environment, the SynapseAI On-Premise software package must be installed first. Manually install the components listed in Set up On Premise before installing the Habana TensorFlow package.
To prepare Habana TensorFlow environment, download and execute bash script tensorflow_installation.sh. This script works only for currently supported Operating Systems specified in Support Matrix.
By installing Habana TensorFlow environment, the following will be performed throughout the execution:
Auto-detect OS type and supported Python version for which packages are present on the Python Package Index (PyPI).
Try to auto-detect SynapseAI software version and build number based on installed packages.
Install OS specific dependent deb/rpm packages.
(Disabled by default) Install extra Model references requirements. See Model References Requirements.
Download and install Open MPI and
mpi4py
package.
Set the
MPI_ROOT
environment variable for use in the command line
Uninstall any existing TensorFlow package.
Uninstall existing Habana TensorFlow Python packages.
Install recommended TensorFlow package (configurable via
--tf
parameter).Install Habana TensorFlow Python packages matching the SynapseAI software package.
Add required environment variables in
/etc/profile.d/habanalabs.sh
and source/etc/profile.d/habanalabs.sh
in~/.bashrc
.Run simple TensorFlow workload with Habana TensorFlow and validates that it has been executed on Habana Gaudi.
Note
tensorflow_installation.sh
accepts optional input parameters that can also override auto-detection described above.
Run ./tensorflow_installation.sh --help
for more details.
Model References Requirements¶
Habana provides a number of model references optimized to run on Gaudi. Those models are available at Model-References page.
Many of the references require additional 3rd party packages, not provided by Habana. This section describes how to install the required 3rd party packages.
There are two types of packages required by the Model References:
System packages - installed with OS packet manager (e.g. apt in case of Ubuntu). To install system packages, run installation script with
--extra_deps
argument:./tensorflow_installation.sh --extra_deps
.Python packages - installed with pip tools. Packages required to run topologies from Model References repository are defined in per-topology
requirements.txt
files in each folder containing the topologies’ scripts.
PyTorch Installation¶
This section describes how to obtain and install the PyTorch software package. Follow the instructions outlined below to install PyTorch packages on a bare metal platform or virtual machine without a Docker image.
Habana PyTorch packages consist of:
torch
- PyTorch framework package with Habana supporthabana-torch-plugin
- Libraries and modules needed to execute PyTorch on single card, single node and multi node setup.habana-torch-dataloader
- Habana multi-threaded dataloader package.pytorch-lightning
- PyTorch Lightning package with Habana support.torchvision
- Torchvision package compiled intorch
environment. No Habana specific changes in this package.
Setting Up the Environment¶
To set up the environment, the SynapseAI On-Premise software package must be installed first. Manually install the components listed in Set up On-Premise before installing the PyTorch package.
To setup Habana PyTorch environment, download and execute bash script pytorch_installation.sh.
By installing the PyTorch environment, the below will be performed throughout the execution:
Autodetect OS type and the supported python version for which Habana PyTorch wheel packages are present in Vault.
Try to autodetect Habana software version and build number.
Install OS specific dependent deb/rpm packages.
Download and install Open MPI and mpi4py package.
Set the
MPI_ROOT
environment variable for use in the command lineDownload tar ball file that will have the PyTorch specific packages from the Habana Vault.
Install
requirements-pytorch.txt
which exists inside tar ball.Uninstall torch, as it will be installed by deafult while installing
requirements-pytorch.txt
.Install Habana PyTorch python packages.
Uninstall pillow package and install pillow-simd.
Add the required environment variables in
/etc/profile.d/habanalabs.sh
and source/etc/profile.d/habanalabs.sh
in~/.bashrc
.
Note
Refer to the Support Matrix to view the supported python version for each of the Operating Systems.
Command Line Usage¶
The following are examples of CLI usage when executing PyTorch packages:
To autodetect OS type, Habana Software version and build number, use
./pytorch_installation.sh
.To download and install specific version and build, use
./pytorch_installation.sh -v {Build number}
.
The Supported Options:
-v <software version>
- Habana software version eg {Version}-b <build/revision>
- Habana build number eg: 148 in 1.2.0-148-os <os version>
- OS version<ubuntu2004/ubuntu1804/amzn2/rhel79/rhel83/centos83>
.-ndep
- do not installrpm/deb
dependencies.-sys
- install python packages without--user
.-u
- install python packages with--user
.
Note
Some PyTorch models need additional python packages. They can be installed using python requirements files provided in Model References repository. Refer to Model References repository for detailed instructions on running PyTorch models.
Run Using Containers¶
Set up SynapseAI SW Stack¶
Package Retrieval:
Download and install the public key:
curl -X GET https://vault.habana.ai/artifactory/api/gpg/key/public | sudo apt-key add --
Get the name of the operating system:
lsb_release -c | awk '{print $2}'
Create an apt source file /etc/apt/sources.list.d/artifactory.list with deb https://vault.habana.ai/artifactory/debian <OS name from previous step> main content.
Update Debian cache:
sudo dpkg --configure -a
sudo apt-get update
Firmware Installation:
Install the Firmware:
sudo apt install -y habanalabs-firmware
Driver Installation:
The habanalabs-dkms_all package installs both the habanalabs and habanalabs_en (Ethernet) drivers. If automation scripts are used, the scripts must be modified to load/unload both drivers.
On kernels 5.12 and later, you can load/unload the two drivers in no specific order. On kernels below 5.12, the habanalabs_en driver must be loaded before the habanalabs driver and unloaded after the habanalabs driver.
Run the below command to install both the habanalabs and habanalabs_en driver:
sudo apt install -y habanalabs-dkms
Load the habanalabs_en driver first and the habanalabs driver after:
sudo modprobe <driver name>
You can enable unattended upgrade to automatically install the latest Habana drivers. See Habana Driver Unattended Upgrade.
Firmware Installation:
Install the Firmware:
sudo apt install -y habanalabs-firmware
Driver Installation:
The habanalabs-dkms_all package installs both the habanalabs and habanalabs_en (Ethernet) drivers. If automation scripts are used, the scripts must be modified to load/unload both drivers.
On kernels 5.12 and later, you can load/unload the two drivers in no specific order. On kernels below 5.12, the habanalabs_en driver must be loaded before the habanalabs driver and unloaded after the habanalabs driver.
Run the below command to install both the habanalabs and habanalabs_en driver:
sudo apt install -y habanalabs-dkms
Load the habanalabs_en driver first and the habanalabs driver after:
sudo modprobe <driver name>
You can enable unattended upgrade to automatically install the latest Habana drivers. See Habana Driver Unattended Upgrade.
Package Retrieval:
Create /etc/yum.repos.d/Habana-Vault.repo with the following content:
[vault]
name=Habana Vault
baseurl=https://vault.habana.ai/artifactory/AmazonLinux2
enabled=1
gpgcheck=0
gpgkey=https://vault.habana.ai/artifactory/AmazonLinux2/repodata/repomod.xml.key
repo_gpgcheck=0
Update YUM cache by running the following command:
sudo yum makecache
Verify correct binding by running the following command:
yum search habana
This will search for and list all packages with the word Habana.
Firmware Installation:
Install the Firmware:
sudo yum install -y habanalabs-firmware
Driver Installation:
The habanalabs-dkms_all package installs both the habanalabs and habanalabs_en (Ethernet) drivers. If automation scripts are used, the scripts must be modified to load/unload both drivers.
On kernels 5.12 and later, you can load/unload the two drivers in no specific order. On kernels below 5.12, the habanalabs_en driver must be loaded before the habanalabs driver and unloaded after the habanalabs driver.
The below commands installs/uninstalls both the habanalabs and habanalabs_en driver.
(Recommended) Remove the previous driver package:
sudo yum remove habanalabs*
Install the driver:
sudo yum install -y habanalabs
Load the habanalabs_en driver first and the habanalabs driver after:
sudo modprobe <driver name>
Package Retrieval:
Create /etc/yum.repos.d/Habana-Vault.repo with the following content:
[vault]
name=Habana Vault
baseurl=https://<access token>@vault.habana.ai/artifactory/rhel/8/8.3
enabled=1
gpgcheck=1
gpgkey=https://<access token>@vault.habana.ai/artifactory/rhel/8/8.3/repodata/repomd.xml.key
repo_gpgcheck=0
Update YUM cache by running the following command:
sudo yum makecache
Verify correct binding by running the following command:
yum search habana
This will search for and list all packages with the word Habana.
Reinstall libarchive package by following command:
sudo dnf install -y libarchive*
Firmware Installation:
Install the Firmware:
sudo yum install -y habanalabs-firmware
Driver Installation:
The habanalabs-dkms_all package installs both the habanalabs and habanalabs_en (Ethernet) drivers. If automation scripts are used, the scripts must be modified to load/unload both drivers.
On kernels 5.12 and later, you can load/unload the two drivers in no specific order. On kernels below 5.12, the habanalabs_en driver must be loaded before the habanalabs driver and unloaded after the habanalabs driver.
The below commands installs/uninstalls both the habanalabs and habanalabs_en driver.
(Recommended) Remove the previous driver package:
sudo yum remove habanalabs*
Install the driver:
sudo yum install -y habanalabs
Load the habanalabs_en driver first and the habanalabs driver after:
sudo modprobe <driver name>
Package Retrieval:
Create /etc/yum.repos.d/Habana-Vault.repo with the following content:
[vault]
name=Habana Vault
baseurl=https://vault.habana.ai/artifactory/centos/8/8.3
enabled=1
gpgcheck=0
gpgkey=https://vault.habana.ai/artifactory/centos/8/8.3/repodata/repomod.xml.key
repo_gpgcheck=0
Update YUM cache:
sudo yum makecache
Verify correct binding:
yum search habana
This will search for and list all packages with the word Habana.
Firmware Installation:
Install the Firmware:
sudo yum install -y habanalabs-firmware
Driver Installation:
The habanalabs-dkms_all package installs both the habanalabs and habanalabs_en (Ethernet) drivers. If automation scripts are used, the scripts must be modified to load/unload both drivers.
On kernels 5.12 and later, you can load/unload the two drivers in no specific order. On kernels below 5.12, the habanalabs_en driver must be loaded before the habanalabs driver and unloaded after the habanalabs driver.
The below commands installs/uninstalls both the habanalabs and habanalabs_en driver.
(Recommended) Remove the previous driver package:
sudo yum remove habanalabs*
Install the driver:
sudo yum install -y habanalabs
Load the habanalabs_en driver first and the habanalabs driver after:
sudo modprobe <driver name>
Set up Container Usage¶
To run containers, make sure to install and set up container runtime as detailed in the below sections.
Install Container Runtime¶
The container runtime is a modified runc that installs the container runtime library. This provides you the ability to select the devices to be mounted in the container. You only need to specify the indices of the devices for the container, and the container runtime will handle the rest. The container runtime can support both docker and Kubernetes.
Package Retrieval:
Download and install the public key:
curl -X GET https://vault.habana.ai/artifactory/api/gpg/key/public | sudo apt-key add --
Get the name of the operating system:
lsb_release -c | awk '{print $2}'
Create an apt source file /etc/apt/sources.list.d/artifactory.list with deb https://vault.habana.ai/artifactory/debian <OS name from previous step> main content.
Update Debian cache:
sudo dpkg --configure -a
sudo apt-get update
Install habanalabs-container-runtime:
Install the habanalabs-container-runtime
package:
sudo apt install -y habanalabs-container-runtime
Package Retrieval:
Download and install the public key:
curl -X GET https://vault.habana.ai/artifactory/api/gpg/key/public | sudo apt-key add --
Get the name of the operating system:
lsb_release -c | awk '{print $2}'
Create an apt source file /etc/apt/sources.list.d/artifactory.list with deb https://vault.habana.ai/artifactory/debian <OS name from previous step> main content.
Update Debian cache:
sudo dpkg --configure -a
sudo apt-get update
Install habanalabs-container-runtime:
Install the habanalabs-container-runtime
package:
sudo apt install -y habanalabs-container-runtime
Package Retrieval:
Create /etc/yum.repos.d/Habana-Vault.repo with the following content:
[vault]
name=Habana Vault
baseurl=https://vault.habana.ai/artifactory/AmazonLinux2
enabled=1
gpgcheck=0
gpgkey=https://vault.habana.ai/artifactory/AmazonLinux2/repodata/repomod.xml.key
repo_gpgcheck=0
Update YUM cache by running the following command:
sudo yum makecache
Verify correct binding by running the following command:
yum search habana
This will search for and list all packages with the word Habana.
Install habanalabs-container-runtime:
Install the habanalabs-container-runtime
package:
sudo yum install -y habanalabs-container-runtime
Package Retrieval:
Create /etc/yum.repos.d/Habana-Vault.repo with the following content:
[vault]
name=Habana Vault
baseurl=https://<access token>@vault.habana.ai/artifactory/rhel/8/8.3
enabled=1
gpgcheck=1
gpgkey=https://<access token>@vault.habana.ai/artifactory/rhel/8/8.3/repodata/repomd.xml.key
repo_gpgcheck=0
Update YUM cache by running the following command:
sudo yum makecache
Verify correct binding by running the following command:
yum search habana
This will search for and list all packages with the word Habana.
Reinstall libarchive package by following command:
sudo dnf install -y libarchive*
Install habanalabs-container-runtime:
Install the habanalabs-container-runtime
package:
sudo yum install -y habanalabs-container-runtime
Package Retrieval:
Create /etc/yum.repos.d/Habana-Vault.repo with the following content:
[vault]
name=Habana Vault
baseurl=https://vault.habana.ai/artifactory/centos/8/8.3
enabled=1
gpgcheck=0
gpgkey=https://vault.habana.ai/artifactory/centos/8/8.3/repodata/repomod.xml.key
repo_gpgcheck=0
Update YUM cache:
sudo yum makecache
Verify correct binding:
yum search habana
This will search for and list all packages with the word Habana.
Install habanalabs-container-runtime:
Install the habanalabs-container-runtime
package:
sudo yum install -y habanalabs-container-runtime
Set up Container Runtime¶
To register the habana
runtime, use the method below that is best
suited to your environment. You might need to merge the new argument
with your existing configuration.
Note
As of Kubernetes 1.20 support for docker has been deprecated.
Register Habana runtime by adding the following to /etc/docker/daemon.json:
sudo tee /etc/docker/daemon.json <<EOF { "runtimes": { "habana": { "path": "/usr/bin/habana-container-runtime", "runtimeArgs": [] } } } EOF
For Kubernetes, reconfigure the default runtime by adding the following to
/etc/docker/daemon.json
:
"default-runtime": "habana"
It will look similar to this:
{
"default-runtime": "habana",
"runtimes": {
"habana": {
"path": "/usr/bin/habana-container-runtime",
"runtimeArgs": []
}
}
}
Note
The above step is optional for other use cases.
Restart Docker:
sudo systemctl restart docker
If a host machine has eight Habana devices, you can mount all using the environment variable HABANA_VISIBLE_DEVICES=all
. The below shows the usage example:
docker run --rm --runtime=habana -e HABANA_VISIBLE_DEVICES=all {docker image} /bin/bash -c "ls /dev/hl*"
/dev/hl0
/dev/hl1
/dev/hl2
/dev/hl3
/dev/hl4
/dev/hl5
/dev/hl6
/dev/hl7
/dev/hl_controlD0
/dev/hl_controlD1
/dev/hl_controlD2
/dev/hl_controlD3
/dev/hl_controlD4
/dev/hl_controlD5
/dev/hl_controlD6
/dev/hl_controlD7
This variable controls which Habana devices will be made accessible inside the container. Possible values:
0,1,2 … - A comma-separated list of index(es).
all - All Habana devices will be accessible. This is the default value.
Register Habana runtime:
sudo tee /etc/containerd/config.toml <<EOF
disabled_plugins = []
version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "habana"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.habana]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.habana.options]
BinaryName = "/usr/bin/habana-container-runtime"
[plugins."io.containerd.runtime.v1.linux"]
runtime = "habana-container-runtime"
EOF
Restart containerd:
bash sudo systemctl restart containerd
Pull Prebuilt Containers¶
Prebuilt containers are provided in:
Habana Vault
Amazon ECR Public Library
AWS Deep Learning Containers (DLC)
Pull and Launch Docker Image - Habana Vault¶
Note
Before running docker, make sure to map the dataset as detailed in Map Dataset to Docker.
To pull and run the Habana Docker images use the below code examples. Update the parameters listed in the following table to run the desired configuration.
Parameter |
Description |
Values |
---|---|---|
$OS |
Operating System of Image |
[ubuntu18.04, ubuntu20.04, amzn2, centos8.3, rhel8.3] |
$TF_VERSION |
Desired TensorFlow Version |
[2.8.0, 2.7.1] |
$PT_VERSION |
PyTorch Version |
[1.10.2] |
docker pull vault.habana.ai/gaudi-docker/1.4.1/{$OS}/habanalabs/tensorflow-installer-tf-cpu-${TF_VERSION}:1.4.1-11
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host vault.habana.ai/gaudi-docker/1.4.1/{$OS}/habanalabs/tensorflow-installer-tf-cpu-${TF_VERSION}:1.4.1-11
docker pull vault.habana.ai/gaudi-docker/1.4.1/{$OS}/habanalabs/pytorch-installer-1.10.2:1.4.1-11
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.4.1/{$OS}/habanalabs/pytorch-installer-1.10.2:1.4.1-11
Amazon ECR Public Gallery¶
To pull and run docker images from Amazon ECR Public Library, make sure to follow the steps detailed in Pulling a public image.
AWS Deep Learning Containers¶
To set up and use AWS Deep Learning Containers, follow the instructions detailed in AWS Available Deep Learning Containers Images.
Build Docker Images from Habana Dockerfiles¶
Download Docker files and build script from the Setup and Install Repo to a local directory.
Run the build script to generate a Docker image:
./docker_build.sh mode [tensorflow,pytorch] os [ubuntu18.04,ubuntu20.04,amzn2,centos8.3,rhel8.3] tf_version [{Habana TF Version 1}, {Habana TF Version 2}]
For example:
./docker_build.sh tensorflow ubuntu20.04 2.7.1
Launch Docker Image that was Built¶
Note
Before running docker, make sure to map the dataset as detailed in Map Dataset to Docker.
Launch the docker image using the below code examples. Update the parameters listed in the following table to run the desired configuration.
Parameter |
Description |
Values |
---|---|---|
$OS |
Operating System of Image |
[ubuntu18.04, ubuntu20.04, amzn2, centos8.3, rhel8.3] |
$TF_VERSION |
Desired TensorFlow Version |
[2.8.0, 2.7.1] |
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host vault.habana.ai/gaudi-docker/1.4.1/${OS}/habanalabs/tensorflow-installer-tf-cpu-${TF_VERSION}:1.4.1-11
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.4.1/${OS}/habanalabs/pytorch-installer-1.10.2:1.4.1-11
Map Dataset to Docker¶
Make sure to download the dataset prior to running docker and mount the
location of your dataset to the docker by adding the below flag. For
example, host dataset location /opt/datasets/imagenet
will mount to
/datasets/imagenet
inside the docker:
-v /opt/datasets/imagenet:/datasets/imagenet
Note
OPTIONAL: Add the following flag to mount a local host share folder to the docker in order to be able to transfer files out of docker:
-v $HOME/shared:/root/shared
Set up Python for Models¶
Using your own models requires setting python 3.8 as the default python version. If python 3.8 is not the default version, replace any call to the python command on your model with $PYTHON and define the environment variable as below:
export PYTHON=/usr/bin/python3.8
Running models from Habana Model-References, requires the PYTHON environment variable to match the supported python release:
export PYTHON=/usr/bin/python3.8
Note
Python 3.8 is the supported python release for all Operating Systems listed in the Support Matrix.