Installing Intel Gaudi SW Packages Individually

Installing the package with internet connection available allows the network to download and install the required dependencies (apt get, yum install or pip install etc.). The installation contains the following Installers:

  • habanalabs-graph – installs the graph compiler and the run-time.

  • habanalabs-thunk – installs the thunk library.

  • habanalabs-dkms – installs the habanalabs, habanalabs_cn, habanalabs_en and habanalabs_ib driver. The habanalabs_ib driver is supported on Gaudi 2 only.

  • habanalabs-rdma-core - installs IBVerbs libraries which provide Intel Gaudi’s libhlib along with libibverbs. The habanalabs-rdma-core package is supported on Gaudi 2 only.

  • habanalabs-firmware - installs the Gaudi firmware.

  • habanalabs-firmware-tools – installs various firmware tools (hlml, hl-smi, etc).

  • habanalabs-qual – installs the qualification application package. See Qualification Library.

  • habanalabs-container-runtime - installs the habanalabs-container-runtime library.

Note

Running the below commands installs the latest version only.

KMD Dependencies:

  1. Install Deb libraries

sudo apt install dkms  libelf-dev
  1. Install headers:

sudo apt install linux-headers-$(uname -r)
  1. After kernel upgrade, reboot your machine.

Firmware Installation:

Install the Firmware:

sudo apt install -y habanalabs-firmware

Driver Installation:

The habanalabs-dkms_all package installs the habanalabs, habanalabs_cn and habanalabs_en (Ethernet) drivers. If automation scripts are used, the scripts must be modified to load/unload the drivers.

  1. Run the below command to install the drivers:

sudo apt install -y habanalabs-dkms
  1. Unload the drivers in this order - habanalabs, habanalabs_cn, habanalabs_en:

sudo modprobe -r <driver name>
  1. Load the drivers in this order - habanalabs_en, habanalabs_cn, habanalabs:

sudo modprobe <driver name>

Thunk Installation:

Install the thunk library:

sudo apt install -y habanalabs-thunk

FW Tools Installation:

Install habanalabs-firmware-tools tools:

sudo apt install -y habanalabs-firmware-tools

Graph Compiler and Run-time Installation:

Install the habanalabs-graph and run-time:

sudo apt install -y habanalabs-graph

(Optional) :code:`hl_qual` Installation:

Install hl_qual:

sudo apt install -y habanalabs-qual

For further details, see Gaudi Qualification Library.

Container Runtime Installation:

Install habanalabs-container-runtime:

sudo apt install -y habanalabs-container-runtime

Update Environment Variables and More

When the installation is complete, close the shell and re-open it. Or, run the following:

source /etc/profile.d/habanalabs.sh

source ~/.bashrc

Package Retrieval:

  1. Create /etc/yum.repos.d/Habana-Vault.repo with the following content:

[vault]

name=Habana Vault

baseurl=https://vault.habana.ai/artifactory/AmazonLinux2

enabled=1

gpgcheck=0

gpgkey=https://vault.habana.ai/artifactory/AmazonLinux2/repodata/repomod.xml.key

repo_gpgcheck=0
  1. Update YUM cache by running the following command:

sudo yum makecache
  1. Verify correct binding by running the following command:

yum search habana

This will search for and list all packages with the word Habana.

KMD Dependencies:

  1. Check your Linux kernel version:

uname -r
  1. Install headers:

sudo yum install kernel-devel
  1. After kernel upgrade, reboot your machine.

Additional Dependencies:

Add yum-utils:

sudo yum install -y yum-utils

Firmware Installation:

Install the Firmware:

sudo yum install -y habanalabs-firmware

Driver Installation:

The habanalabs-dkms_all package installs the habanalabs, habanalabs_cn and habanalabs_en (Ethernet) drivers. If automation scripts are used, the scripts must be modified to load/unload the drivers.

  1. Run the below command to install all drivers:

sudo yum install -y habanalabs
  1. Unload the drivers in this order - habanalabs, habanalabs_cn, habanalabs_en:

sudo modprobe -r <driver name>
  1. Load the drivers in this order - habanalabs_en, habanalabs_cn, habanalabs:

sudo modprobe <driver name>

Thunk Installation

Install the thunk library:

sudo yum install -y habanalabs-thunk

FW Tool Installation:

Install Firmware tools:

sudo yum install -y habanalabs-firmware-tools

Graph Compiler and Run-time Installation:

Install the graph compiler and run-time:

sudo yum install -y habanalabs-graph

(Optional) Qual Installation:

Install hl_qual:

sudo yum install -y habanalabs-qual

For further details, see Qualification Library.

Container Runtime Installation:

Install habanalabs-container-runtime:

sudo yum install -y habanalabs-container-runtime

Update Environment Variables and More

When the installation is complete, close the shell and re-open it. Or, run the following:

source /etc/profile.d/habanalabs.sh

source ~/.bashrc

Note

RHEL9.2 installation is currently available on Gaudi 2 only.

Package Retrieval:

  1. Create /etc/yum.repos.d/Habana-Vault.repo with the following content:

[vault]

name=Habana Vault

baseurl=https://vault.habana.ai/artifactory/rhel/9/9.2

enabled=1

repo_gpgcheck=0
  1. Update YUM cache by running the following command:

sudo yum makecache
  1. Verify correct binding by running the following command:

yum search habana

This will search for and list all packages with the word Habana.

  1. Reinstall libarchive package by following command:

sudo dnf install -y libarchive*

KMD Dependencies:

  1. Check your Linux kernel version:

uname -r
  1. Install headers:

sudo yum install kernel-devel
  1. After kernel upgrade, reboot your machine.

Additional Dependencies:

Add yum-utils:

sudo yum install -y yum-utils

Firmware Installation:

Install the Firmware:

sudo yum install -y habanalabs-firmware

Driver Installation:

The habanalabs-dkms_all package installs the habanalabs, habanalabs_cn, habanalabs_en (Ethernet) and habanalabs_ib drivers. If automation scripts are used, the scripts must be modified to load/unload the drivers.

  1. Run the below command to install all three drivers:

sudo yum install -y habanalabs
  1. Unload the drivers in this order - habanalabs, habanalabs_cn, habanalabs_en and habanalabs_ib:

sudo modprobe -r <driver name>
  1. Load the drivers in this order - habanalabs_en and habanalabs_ib, habanalabs_cn, habanalabs:

sudo modprobe <driver name>

RDMA Core Installation:

Install habanalabs-rdma-core:

dnf install habanalabs-rdma-core

Thunk Installation

Install habanalabs-thunk:

sudo yum install -y habanalabs-thunk

FW Tool Installation:

Install habanalabs-firmware-tools:

sudo yum install -y habanalabs-firmware-tools

Graph Compiler and Run-time Installation:

Install habanalabs-graph and run-time:

sudo yum install -y habanalabs-graph

(Optional) :code:`hl_qual` Installation:

Install hl_qual:

sudo yum install -y habanalabs-qual

For further details, see Qualification Library.

Container Runtime Installation:

Install habanalabs-container-runtime:

sudo yum install -y habanalabs-container-runtime

Update Environment Variables and More

When the installation is complete, close the shell and re-open it. Or, run the following:

source /etc/profile.d/habanalabs.sh

source ~/.bashrc

Set Number of Huge Pages

Some training models use huge pages. It is recommended to set the number of huge pages as provided below:

#set current hugepages
sudo sysctl -w vm.nr_hugepages=15000
#Remove old entry if exists in sysctl.conf
sudo sed --in-place '/nr_hugepages/d' /etc/sysctl.conf
#Insert huge pages settings to persist
echo "vm.nr_hugepages=15000" | sudo tee -a /etc/sysctl.conf

Bring up Network Interfaces

If training using Gaudi network interfaces for multi-node scaleout (external Gaudi network interfaces between servers), please ensure the network interfaces are brought up. These interfaces need to be brought up every time the kernel module is loaded or unloaded and reloaded.

Note

This section is not relevant for AWS users.

A reference on how to bring up the interfaces is provided in the manage_network_ifs.sh script as detailed in manage_network_ifssh.

Use the following commands:

# manage_network_ifs.sh requires ethtool
sudo apt-get install ethtool
./manage_network_ifs.sh --up