Enabling InfiniBand NICs (Verbs) for Host NIC Scaling
On this Page
Enabling InfiniBand NICs (Verbs) for Host NIC Scaling¶
Enabling Verbs¶
In order to use verbs provider, the following must be performed:
Re-configuring libfabric with verbs enable.
Installing UCX package to allow communication via InfiniBand.
Re-configuring MPI with the UCX package and verbs support.
Reconfiguring Libfabric¶
For a clean installation, you should remove the environment variables which point to the local MPI and libfabric packages, and create a new directory:
Remove the local installations:
Unset MPI environment variables:
Make a clean directory for new the installations or use
/opt/amazon
:Re-install and configure libfabric with verbs support. Note that the below libfabric version is an example version only. Make sure to use the libfabric version listed in the Support Matrix:
Make sure that
fi_info -l
displays the verbs option:%> $CONNECTION_DIR/efa/bin/fi_info -l usnic: version: 1.0 verbs: <------------------------- version: 116.10 <------------------------- ofi_rxm: version: 116.10 ofi_rxd: version: 116.10 shm: version: 116.10 udp: version: 116.10 tcp: version: 116.10 sockets: version: 116.10 net: version: 116.10 ofi_hook_perf: version: 116.10 ofi_hook_debug: version: 116.10 ofi_hook_noop: version: 116.10 ofi_hook_hmem: version: 116.10 ofi_hook_dmabuf_peer_mem: version: 116.10 ofi_mrail: version: 116.10
Installing UCX Package¶
Install the required Linux packages, libtool and autoconf:
Install the UCX package:
Reconfiguring MPI¶
Install the Open MPI package. Note that the below Open MPI version is an example version only. Make sure to use the Open MPI version version listed in the Support Matrix:
%>wget https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.4.tar.bz2 -P /tmp/openmpi %>cd /tmp/openmpi %>tar -xf ./openmpi-4.1.4.tar.bz2 %>cd ./openmpi-4.1.4 %>./configure --prefix=$NEW_PKGS_DIR/mpi --with-sge --disable-builtin-atomics --enable-orterun-prefix-by-default --with-ucx=$NEW_PKGS_DIR/ucx --with-verbs %>make %>make install
Export the MPI environment variables to point to the new MPI location:
Installing Mellanox OFED (MLNX_OFED) Driver¶
For a successful installation of the Mellanox driver, Mellanox OpenFabrics Enterprise Distribution (MLNX_OFED) must be installed on the system. MLNX_OFED plays a vital role in ensuring proper functionality and compatibility with the Mellanox driver.
If MLNX_OFED is not installed on your system, the below lists the required steps. However, depending on your system’s configuration and requirements, there may be additional steps or dependencies to consider. You can also refer to https://enterprise-support.nvidia.com/s/article/howto-install-mlnx-ofed-driver.
Check if MLNX_OFED is installed by running
ofed_info
. If the command is not found, it means MLNX_OFED is not installed.To install MLNX_OFED, execute the following command, replacing the XXXX with the needed version (or the one provided by you as a customer):
Once the installation is complete, reboot the machine.
Install the
habanalabs-dkms
as shown in the Installation Guide and On-Premise System Update.
Note
The supported MLNX_OFED versions are 5.4-1.1.1.1.8 and 5.0-2.1.8.0.