Enabling InfiniBand NICs (Verbs) for Host NIC Scaling
On this Page
Enabling InfiniBand NICs (Verbs) for Host NIC Scaling¶
Enabling Verbs¶
In order to use verbs provider, the following must be performed:
Re-configuring libfabric with verbs enable.
Installing UCX package to allow communication via InfiniBand.
Re-configuring MPI with the UCX package and verbs support.
Reconfiguring Libfabric¶
For a clean installation, you should remove the environment variables which point to the local MPI and libfabric packages, and create a new directory:
Remove the local installations:
%>rm -rf /opt/amazon/openmpi %>rm -rf /opt/amazon/efa
Unset MPI environment variables:
%> unset MPICC %> unset OPAL_PREFIX %> unset MPI_ROOT
In case there are other environment variables which point to
/opt/amazon/openmpi/
or/opt/amazon/efa/
, re-set those variables. Make sure to save its original context for exporting MPI environment variables to the new MPI location.Make a clean directory for new the installations or use
/opt/amazon
:%> export NEW_PKGS_DIR=<your new, clean directory or /opt/amazon> %> mkdir $NEW_PKGS_DIR
Re-install and configure libfabric with verbs support. Note that the below libfabric version is an example version only. Make sure to use the libfabric version listed in the Support Matrix:
%>wget https://github.com/ofiwg/libfabric/releases/download/v1.16.1/libfabric-1.16.1.tar.bz2 -P /tmp/lib %>cd /tmp/lib %>tar -xf ./libfabric-1.16.1.tar.bz2 %>cd ./libfabric-1.16.1 %>./configure --prefix=$NEW_PKGS_DIR/efa/ --enable-psm3-verbs --enable-verbs=yes --enable-debug %>make %>make install
Make sure that
fi_info -l
displays the verbs option:%> $CONNECTION_DIR/efa/bin/fi_info -l usnic: version: 1.0 verbs: <------------------------- version: 116.10 <------------------------- ofi_rxm: version: 116.10 ofi_rxd: version: 116.10 shm: version: 116.10 udp: version: 116.10 tcp: version: 116.10 sockets: version: 116.10 net: version: 116.10 ofi_hook_perf: version: 116.10 ofi_hook_debug: version: 116.10 ofi_hook_noop: version: 116.10 ofi_hook_hmem: version: 116.10 ofi_hook_dmabuf_peer_mem: version: 116.10 ofi_mrail: version: 116.10
Installing UCX Package¶
Install the required Linux packages, libtool and autoconf:
%> sudo apt update %> sudo apt-get install libtool %> sudo apt-get install autoconf
Install the UCX package:
%> wget https://github.com/openucx/ucx/releases/download/v1.13.1/ucx-1.13.1.tar.gz -P /tmp/ucx %> cd /tmp/ucx %> tar -xf ./ucx-1.13.1.tar.gz %> cd ./ucx-1.13.1 %>./configure --prefix=$NEW_PKGS_DIR/ucx %> make %> make install
Reconfiguring MPI¶
Install the Open MPI package. Note that the below Open MPI version is an example version only. Make sure to use the Open MPI version version listed in the Support Matrix:
%>wget https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.4.tar.bz2 -P /tmp/openmpi %>cd /tmp/openmpi %>tar -xf ./openmpi-4.1.4.tar.bz2 %>cd ./openmpi-4.1.4 %>./configure --prefix=$NEW_PKGS_DIR/mpi --with-sge --disable-builtin-atomics --enable-orterun-prefix-by-default --with-ucx=$NEW_PKGS_DIR/ucx --with-verbs %>make %>make install
Export the MPI environment variables to point to the new MPI location:
%>export MPICC=$NEW_PKGS_DIR/mpi/bin/mpicc %>export OPAL_PREFIX=$NEW_PKGS_DIR/mpi %>export MPI_ROOT=$NEW_PKGS_DIR/mpi
Note
If some environment variables were re-set without including the local MPI/EFA installation, re-set them with the new installation:
%>LD_LIBRARY_PATH=$NEW_PKGS_DIR/mpi/lib:$NEW_PKGS_DIR/efa/lib:$LD_LIBRARY_PATH
%>PATH=$NEW_PKGS_DIR/mpi/bin:$NEW_PKGS_DIR/efa/bin:$PATH
Installing Mellanox OFED (MLNX_OFED) Driver¶
For a successful installation of the Mellanox driver, Mellanox OpenFabrics Enterprise Distribution (MLNX_OFED) must be installed on the system. MLNX_OFED plays a vital role in ensuring proper functionality and compatibility with the Mellanox driver.
If MLNX_OFED is not installed on your system, the below lists the required steps. However, depending on your system’s configuration and requirements, there may be additional steps or dependencies to consider. You can also refer to https://enterprise-support.nvidia.com/s/article/howto-install-mlnx-ofed-driver.
Check if MLNX_OFED is installed by running
ofed_info
. If the command is not found, it means MLNX_OFED is not installed.To install MLNX_OFED, execute the following command, replacing the XXXX with the needed version (or the one provided by you as a customer):
dpkg -i mlnx-ofed-kernel-dkms_5.X-OFED.5.xxxxxxx_all.deb mlnx-ofed-kernel-utils_5.x-OF.xxxxx_amd64.deb mlnx-tools_5.xxxxxx_amd64.deb
Note
If you have a customized version provided by you as a customer, make sure to use that version and not the public ones released by Mellanox.
Once the installation is complete, reboot the machine.
Install the
habanalabs-dkms
as shown in the Installation Guide and On-Premise System Update.
Note
The supported MLNX_OFED versions are 5.4-1.1.1.1.8 and 5.0-2.1.8.0.