Intel Gaudi RDMA PerfTest Tool

This document provides guidelines for installing and running the Intel® Gaudi® RDMA PerfTest tool habanalabs-perf-test on Gaudi accelerator. This tool is designed for low-level, high-performance testing of connectivity through ping-pong and bandwidth communication tests. It utilizes the Reliable Connection (RC) method and RDMA Write operations to deliver performance measurements.

Note

  • The tool is supported only with Gaudi 3 on Ubuntu 22.04 and Ubuntu 24.04.

  • The tool can only be used in a container and is not supported on a VM.

  • The tool can only be used with Gaudi 3 NICs.

Prerequisites

Make sure you have the following packages installed:

  • habanalabs-thunk

  • habanalabs-dkms

  • habanalabs-rdma-core

For more information about the packages installation, see Custom Driver and Software Installation.

Note

If you have upgraded to the 1.19.0 software version, the above packages are already included, and no additional installation is required.

Installation

  1. Download the package files:

    • For Ubuntu 22.04:

      wget https://vault.habana.ai/artifactory/debian/jammy/pool/main/h/habanalabs-perf-test/habanalabs-perf-test_1.19.0-561_amd64.deb
      
    • For Ubuntu 24.04:

      wget https://vault.habana.ai/artifactory/debian/noble/pool/main/h/habanalabs-perf-test/habanalabs-perf-test_1.19.0-561_amd64.deb
      
  2. Install the package files. This installs both the RDMA PerfTest tool and Cloud Testing tool.

    sudo apt install ./habanalabs-perf-test_1.19.0-561_amd64.deb
    

Options and Usage

The following table lists the available tool options and their usage to help you effectively configure the tool for your specific needs.

Option

Description

-p, –port=<port>

Listen on/connect to port <port> (default 18515)

-d, –ib-dev=<dev>

Use IB device <dev> (default first device found)

-i, –ib-port=<port>

Use port <port> of IB device (default 1)

-s, –size=<size>

Size of message to exchange (default 4096)

-m, –mtu=<size>

Path MTU (default 8192)

-r, –rx-depth=<dep>

Number of receives to post at a time (default 128)

-n, –iters=<iters>

Number of exchanges (default 1000)

-l, –sl=<sl>

Service level value (default 0)

-g, –gid-idx=<gid>

Local port gid index (default 2)

Note

  • When connecting device to device directly, set -g 0. IPv4 protocol is not required.

  • When connecting two servers/devices through a switch that supports RDMA RoCE, set -g 2. IPv4 protocol is required.

-c, –chk

Validate received buffer

-x, –logs

Print additional log information

-t, –test-type

  • pp - Ping-pong test (default)

  • bw - Bandwidth test

-h, –help

help

Make sure to run the tool in separate SSH sessions: one for the server and one for the client as shown below:

./perf_test [Opts]           //client session
./perf_test [Opts] <host>    //server session

Example:

./perf_test  -t pp -p 1100 -d hbl_7 -i 20 -g 2 -n 10
./perf_test  -t pp -p 1100 -d hbl_7 -i 20 -g 2 -n 10 <host>

Expected output:

Out: Local address: dev_port 0x7 Device address 0xff7000010000000, QPN 0x002001, GID ::ffff:10.209.11.93

Out: Server started on port = 1100
Out: Ping Pong Test Started!
Out: Finish Test Successfully
Out: Packets Sent: 11
Out: Test PASS
Out: Exiting OK

Cloud Testing Tool

Cloud testing tool is a Python script designed to apply RDMA PerfTest across an entire data center (scale-out). It supports two modes of operation:

  • run.py - Tests connectivity and performance between two nodes in a single direction.

  • cloud_run.py - Performs testing across multiple nodes, covering all pairwise permutations for thorough evaluation.

The following port connectivity options are supported for testing on multiple nodes:

../../_images/connectivity_topology.png

Prerequisites

  • Install the requirements file:

    pip install -r requirements.txt
    
  • Make sure all tested nodes have SSH keys configured for seamless access. The script relies on SSH sessions established via SSH keys.

    Note

    Starting from 1.18.0 release, SSH host keys have been removed from Dockers. To add them, make sure to run /usr/bin/ssh-keygen -A inside the Docker container. If you are running on Kubernetes, make sure the SSH host keys are identical across all Docker containers. To achieve this, you can either build a new Docker image on top of Intel Gaudi Docker image by adding a new layer RUN /usr/bin/ssh-keygen -A, or externally mount the SSH host keys.

  • Make sure all external NIC ports are active and accessible on each node. For more details, see Disable/Enable NICs.

  • Prepare a host file listing all tested nodes with their SSH connection details (IP and port). Use the following format: SSH-IP:SSH-PORT. For example:

    kuku-kvm12-lake:22
    kuku-kvm13-lake:22
    kuku-kvm14-lake:22
    kuku-kvm15-lake:22
    
  • Make sure to write the LD_LIBRARY_PATH environment variable to a file. This ensures it can be accessed in remote SSH sessions during testing:

    echo "LD_LIBRARY_PATH=${LD_LIBRARY_PATH}" | tee ~/.ENV_SCALEUP
    

Options and Usage

The following tables list the available tool options and their usage to help you effectively configure the tool for your specific needs.

PerfTest Script (perftest)

Option

Description

-h, –help

Show the help message and exit

-hf, –host_file

Path to a host_file that includes a host IP list

-skt, –ssh_key_type

SSH key type

-skf, –ssh_key_file

SSH private key file path (default: /home/username/.ssh/id_rsa)

-o, –output

Save all the log files in a specific path (the flag must be set)

Example:

python3 ./cloud_run.py --host_file ./hostfile --output /tmp/output perftest

Test-specific Script

Option

Description

-h, –help

Show the help message and exit

-tp, –tcp_port

Specify the TCP port range script will use (default:1100)

-s, –size

Size of message to exchange (default: 4096)

-r, –rx_depth

Number of receives to post at a time (default: 128)

-n, –iters

Number of exchanges (default: 10)

-c, –chk

Validate received buffer

Example:

python3 ./cloud_run.py --host_file ./hostfile --output /tmp/output perftest ping_pong --size 4096 --rx_depth 128 --iters 10 --chk

Option

Description

-h, –help

Show the help message and exit

-tp, –tcp_port

Specify the TCP port range script will use (default:1100)

-s, –size

Size of message to exchange (default: 1048576)

-r, –rx_depth

Number of receives to post at a time (default: 128)

-n, –iters

Number of exchanges (default: 100000)

-c, –criteria

Pass/fail criteria value for the test threshold in Gbps (no default, optional)

Example:

python3 ./cloud_run.py --host_file ./hostfile --output /tmp/output perftest write_bw --size 1048575 --rx_depth 128 --iters 100000

Expected output:

* CloudReport_<timestamp>.txt - Tested nodes summary.
* <server_host_name>_<client_host_name>
  └── scaleUpRepor_<timestamp>.txt - Specific server and client pair summary.
  └── perftest
    └── <network_ip>
      └── <server - device (ib_dev)>
        └── <device_port (ib_port)>
          └── <client - device (ib_dev)>
            └── <device_port (ib_port)>.txt - Both device application prints.

The output is saved in the output directory in a timestamp-named folder.