Intel Gaudi RDMA PerfTest Tool
On this Page
Intel Gaudi RDMA PerfTest Tool¶
This document provides guidelines for installing and running the Intel® Gaudi® RDMA PerfTest tool
habanalabs-perf-test
on Gaudi accelerator. This tool is designed for
low-level, high-performance testing of connectivity through ping-pong and bandwidth
communication tests. It utilizes the Reliable Connection (RC) method and RDMA Write
operations to deliver performance measurements.
Note
The tool is supported only with Gaudi 3 on Ubuntu 22.04.
The tool can only be used in a container and is not supported on a VM.
The tool can only be used with Gaudi 3 NICs.
Prerequisites¶
Make sure you have the following packages installed:
habanalabs-thunk
habanalabs-dkms
habanalabs-rdma-core
For more information about the packages installation, see Custom Driver and Software Installation.
Note
If you have upgraded to the 1.19.2 software version, the above packages are already included, and no additional installation is required.
Installation¶
Download the package files:
wget https://vault.habana.ai/artifactory/debian/jammy/pool/main/h/habanalabs-perf-test/habanalabs-perf-test_1.19.2-32_amd64.deb
Install the package files:
sudo apt install ./habanalabs-perf-test_1.19.2-32_amd64.deb
Options and Usage¶
RDMA PerfTest tool is executed using cloud_run.py
Python wrapper script. It runs across an entire data center, covering all pairwise permutations for thorough evaluation.
The following port connectivity options are supported for testing on multiple nodes:
Prerequisites¶
The below lists the prerequisites needed to run the cloud_run.py
Python wrapper script:
Install the requirements file:
pip install -r requirements.txt
Make sure all tested nodes have SSH keys configured for seamless access. The script relies on SSH sessions established via SSH keys.
Note
Starting from 1.18.0 release, SSH host keys have been removed from Dockers. To add them, make sure to run
/usr/bin/ssh-keygen -A
inside the Docker container. If you are running on Kubernetes, make sure the SSH host keys are identical across all Docker containers. To achieve this, you can either build a new Docker image on top of Intel Gaudi Docker image by adding a new layerRUN /usr/bin/ssh-keygen -A
, or externally mount the SSH host keys.Verify that all external NIC ports are active and accessible on each node. For more details, see Disable/Enable NICs.
Prepare a host file listing all tested nodes with their SSH connection details (IP and port). Use the following format: SSH-IP:SSH-PORT. For example:
kuku-kvm12-lake:22 kuku-kvm13-lake:22 kuku-kvm14-lake:22 kuku-kvm15-lake:22
Write the
LD_LIBRARY_PATH
environment variable to a file. This ensures it can be accessed in remote SSH sessions during testing:echo "LD_LIBRARY_PATH=${LD_LIBRARY_PATH}" | tee ~/.ENV_SCALEUP
Create
gaudinet.json
file under/etc/habanalabs
directory and configure it as described in the section below.Configure
macAddrInfo.json
file as described below.
Configuring gaudinet.json¶
The gaudinet.json
file is used to configure network settings for Layer 3 (L3) routes and it includes NIC MAC addresses,
IP addresses, subnet masks, and associated gateway MAC addresses for each NIC in the following format:
{
"NIC_NET_CONFIG": [
{
"NIC_MAC": "00:1A:2B:3C:4D:5E",
"NIC_IP": "192.168.1.10",
"SUBNET_MASK": "255.255.255.0",
"GATEWAY_MAC": "00:1A:2B:3C:4D:5F"
},
]
}
To obtain the required data, refer to the instructions from Intel Gaudi 3 Network Configuration available in the Intel Gaudi vault.
Each object inside the NIC_NET_CONFIG array corresponds to the configuration of a single NIC.
The following table describes each object used in the gaudinet.json
:
Object |
Type |
Description |
Format Example |
---|---|---|---|
NIC_MAC |
String |
NIC MAC address. This field is required and must follow the standard MAC address format. |
00:1A:2B:3C:4D:5E |
NIC_IP |
String |
IP address assigned to the NIC. Must be in a valid IPv4 or IPv6 format. |
192.168.1.10 |
SUBNET_MASK |
String |
Subnet mask defining the network’s address range. |
255.255.255.0 |
GATEWAY_MAC |
String |
MAC address of the gateway through which the NIC routes its traffic. This field must follow the standard MAC address format. |
00:1A:2B:3C:4D:5F |
Configuring macAddrinfo.json¶
The macAddrInfo.json
file is generated under /etc/habanalabs
when installing habanalabs-container-runtime
package.
The file is used to configure external NIC ports and it includes the device bus ID and NIC MAC addresses in the following format:
{
"MAC_ADDR_INFO": [
{
"PCI_ID": "0000:34:00.0",
"MAC_ADDR_LIST": [
"ff:ff:ff:ff:ff:ff",
"ff:ff:ff:ff:ff:ff",
"68:93:2E:1A:6E:D5", #external port 2
"68:93:2E:1A:6E:D6", #external port 3
"ff:ff:ff:ff:ff:ff",
"ff:ff:ff:ff:ff:ff",
"ff:ff:ff:ff:ff:ff",
"68:93:2E:1A:6E:DA", #external port 7
"ff:ff:ff:ff:ff:ff",
"ff:ff:ff:ff:ff:ff",
"ff:ff:ff:ff:ff:ff",
"ff:ff:ff:ff:ff:ff",
"ff:ff:ff:ff:ff:ff",
"ff:ff:ff:ff:ff:ff",
"ff:ff:ff:ff:ff:ff",
"ff:ff:ff:ff:ff:ff",
"ff:ff:ff:ff:ff:ff",
"ff:ff:ff:ff:ff:ff",
"ff:ff:ff:ff:ff:ff",
"ff:ff:ff:ff:ff:ff",
"ff:ff:ff:ff:ff:ff",
"ff:ff:ff:ff:ff:ff",
"ff:ff:ff:ff:ff:ff",
"ff:ff:ff:ff:ff:ff"
]
}
]
}
Note
To retrieve the external port numbers of your device, refer to the instructions from HLB-325 UBB Specification available in the Intel Gaudi vault. Internal port locations are represented by ff:ff:ff:ff:ff:ff.
Each object inside the MAC_ADDR_INFO array corresponds to the configuration for a single device.
The following table describes each object used in the macAddrInfo.json
:
Object |
Type |
Description |
Format Example |
---|---|---|---|
PCI_ID |
String |
Gaudi bus ID. This field is required and must follow the standard bus ID format. |
0000:34:00.0 |
MAC_ADDR_LIST |
String array |
NIC MAC addresses. Each MAC address is
located in the array according to the port
number. This address corresponds to the NIC_MAC
address used in the |
68:93:2E:1A:6E:D5 |
Python Wrapper Options¶
Use the -h
argument to view all Options.
The below table presents all the Options available and their description.
Option |
Description |
---|---|
-h, –help |
Show the help message and exit |
-hf, –host_file |
Path to a host_file that includes a host IP list |
-skt, –ssh_key_type |
SSH key type |
-skf, –ssh_key_file |
SSH private key file path (default: /home/username/.ssh/id_rsa) |
-o, –output |
Save all the log files in a specific path (the flag must be set) |
Example:
python3 ./cloud_run.py --host_file ./hostfile --output /tmp/output perftest
Test-specific Options¶
Use the -h
argument to view all Options.
The below tables present all the Options available and their description.
Option |
Description |
---|---|
-h, –help |
Show the help message and exit |
-tp, –tcp_port |
Specify the TCP port range script will use (default:1100) |
-s, –size |
Size of message to exchange (default: 4096) |
-r, –rx_depth |
Number of receives to post at a time (default: 128) |
-n, –iters |
Number of exchanges (default: 10) |
-c, –chk |
Validate received buffer |
Example:
python3 ./cloud_run.py --host_file ./hostfile --output /tmp/output perftest ping_pong --size 4096 --rx_depth 128 --iters 10 --chk
Option |
Description |
---|---|
-h, –help |
Show the help message and exit |
-tp, –tcp_port |
Specify the TCP port range script will use (default:1100) |
-s, –size |
Size of message to exchange (default: 1048576) |
-r, –rx_depth |
Number of receives to post at a time (default: 128) |
-n, –iters |
Number of exchanges (default: 100000) |
-c, –criteria |
Pass/fail criteria value for the test threshold in Gbps (no default, optional) |
Example:
python3 ./cloud_run.py --host_file ./hostfile --output /tmp/output perftest write_bw --size 1048575 --rx_depth 128 --iters 100000
Expected output:
* CloudReport_<timestamp>.txt - Tested nodes summary.
* <server_host_name>_<client_host_name>
└── scaleUpRepor_<timestamp>.txt - Specific server and client pair summary.
└── perftest
└── <network_ip>
└── <server - device (ib_dev)>
└── <device_port (ib_port)>
└── <client - device (ib_dev)>
└── <device_port (ib_port)>.txt - Both device application prints.
The output is saved in the output directory in a timestamp-named folder.