IBM Cloud Quick Start Guide

This document provides instructions for setting up the Intel® Gaudi® 3 AI accelerator instance on the IBM Cloud®, installing Intel Gaudi driver and software, and running inference using the Optimum for Intel Gaudi library and the vLLM Inference Server.

Prerequisites

The IBM Cloud account should be set up. Follow the steps in the IBM Cloud documentation.

Create a Gaudi 3 Instance

Follow the below step-by-step instructions to launch a Gaudi 3 instance on the IBM Cloud.

Set up a Virtual Server

  1. From the IBM Cloud console, go to “Infrastructure > Compute > Virtual server instances”:

    ../_images/Virtural_Services.png
  2. Select the region and click “Create”. For additional information on availability in a region, refer to this page:

    ../_images/Region_Create.png
  3. Confirm your geography, Region, and Zone. Name the instance, and select a “gaudi3” resource group:

    ../_images/Location_and_Details.png
  4. Navigate down to “Server configuration”, click “Change image” and select your desired image. After your image is selected, click “Change profile”:

    ../_images/Server_Configuration.png
  5. In the left-hand side navigation check the “GPU” box. Select the GPU gx3d-160x1792x8gaudi3 profile from the available GPUs, and click “Save”:

    ../_images/Select_Instance_Profile.png
  6. Create or select an SSH key that was previously created. Boot drive is pre-configured, but you can add additional data drives by clicking “Create” and filling out the size of the drive:

    ../_images/SSH_keys.png
  7. Under “Networking”, you can either use the default generated VPC or create your own by clicking “Create VPC”. The “Virtual network interface” is preselected and is recommended:

    ../_images/Networking.png
  8. You can add up to 15 interfaces, with a maximum total server bandwidth of 150 Gbps:

    ../_images/Bandwidth.png
  9. Click “Create virtual server” and begin provisioning. Note that due to the size of the virtual server, it may take up to 20 minutes to complete starting up:

../_images/Create_Virtual_Server.png

Create Floating IP

  1. Navigate to “Infrastructure > Network > Floating IPs”:

    ../_images/Floating_IPs.png
  2. To reserve your Floating IP, select the region that you deployed the “Virtual server instances”, fill in the Floating IP name, and click “Reserve”:

    ../_images/Reserve_Floating_IP.png
  3. Now that you have a Floating IP, bind it to the “Virtual server instances”. Click on the name of the reserved IP:

    ../_images/Floating_IP_details.png
  4. From the “Actions” dropdown menu, click “Bind”:

    ../_images/Bind.png
  5. From the dropdown menu, select the Virtual server instance you want to bind the Floating IP to and click “Bind”:

    ../_images/Bind_Floating_IP.png
  6. You now have a fully deployed Gaudi 3 instance with a Floating IP attached:

    ../_images/Binding_completed.png

Connect to your Instance

  1. Using the Floating IP address that you created, ping your instance to make sure that it’s up and running:

    ping <public-ip-address>
    
  2. Because you created your instance with a public SSH key, you can now connect to it directly by using your private key:

    ssh -i <path-to-private-key-file> root@<public-ip-address>
    

System Setup

Note

The system is pre-installed with RHEL 9.4 OS. The following instructions are intended for RHEL 9.4 but also apply to RHEL 9.2.

Check Installed Devices

From your Gaudi instance, verify that all the Gaudi 3 devices are connected:

lspci -d 1da3: -nn

The output should appear as follows:

a3:00.0  Processing accelerators [1200]:  Habana Labs Ltd.  Device [1da3:1060]  (rev 01)
ad:00.0  Processing accelerators [1200]:  Habana Labs Ltd.  Device [1da3:1060]  (rev 01)
b7:00.0  Processing accelerators [1200]:  Habana Labs Ltd.  Device [1da3:1060]  (rev 01)
c1:00.0  Processing accelerators [1200]:  Habana Labs Ltd.  Device [1da3:1060]  (rev 01)
cb:00.0  Processing accelerators [1200]:  Habana Labs Ltd.  Device [1da3:1060]  (rev 01)
d5:00.0  Processing accelerators [1200]:  Habana Labs Ltd.  Device [1da3:1060]  (rev 01)
df:00.0  Processing accelerators [1200]:  Habana Labs Ltd.  Device [1da3:1060]  (rev 01)
e9:00.0  Processing accelerators [1200]:  Habana Labs Ltd.  Device [1da3:1060]  (rev 01)

Install Intel Gaudi Driver and Software

This section provides two installation options for the Intel Gaudi driver and software. Follow the appropriate installation steps based on your system requirements:

  • Installation on the pre-installed kernel (recommended)

  • Installation on the latest kernel (optional)

  1. Verify the installed kernel version:

    sudo dnf list installed | grep kernel
    

    The output should appear as follows:

    kernel-core.x86_64            5.14.0-427.50.1.el9_4    @rhel-9-for-x86_64-baseos-eus-rpms
    kernel-modules.x86_64         5.14.0-427.50.1.el9_4    @rhel-9-for-x86_64-baseos-eus-rpms
    kernel-modules-core.x86_64    5.14.0-427.50.1.el9_4    @rhel-9-for-x86_64-baseos-eus-rpms
    kernel-tools.x86_64           5.14.0-427.50.1.el9_4    @rhel-9-for-x86_64-baseos-eus-rpms
    kernel-tools-libs.x86_64      5.14.0-427.50.1.el9_4    @rhel-9-for-x86_64-baseos-eus-rpms
    

    Note

    The output example displays the kernel version installed on RHEL 9.4. The output may vary depending on the RHEL version in use.

  2. Verify the available kernel packages that match the kernel version retrieved in Step 1.

    sudo dnf list --showduplicates | grep 5.14.0-427.50.1.el9_4
    
  3. Install the kernel development and header packages. Ensure they match the currently installed kernel version:

    sudo dnf install kernel-devel-5.14.0-427.50.1.el9_4.x86_64 kernel-devel-matched-5.14.0-427.50.1.el9_4.x86_64 kernel-headers-5.14.0-427.50.1.el9_4.x86_64
    
  4. Install the wget utility:

    sudo dnf install -y wget
    
  5. Download the habanalabs-installer.sh script:

       sudo wget -nv https://vault.habana.ai/artifactory/gaudi-installer/1.20.1/habanalabs-installer.sh
       sudo chmod +x habanalabs-installer.sh
    
  6. Install the EPEL repository:

    sudo dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm --nogpgcheck
    

    The EPEL repository provides access to extra packages that are not included in the default RHEL repositories.

  7. Install Intel Gaudi driver and software:

    sudo ./habanalabs-installer.sh install --type base
    
  8. Click “Y” when you see Prepare installation to finish the installation - it will not complete until you confirm.

  1. Install the EPEL repository:

    sudo dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm --nogpgcheck
    

    The EPEL repository provides access to extra packages that are not included in the default RHEL repositories.

  2. Verify the installed kernel version:

    sudo dnf list installed | grep kernel
    

    The output should appear as follows:

    kernel-core.x86_64            5.14.0-427.50.1.el9_4    @rhel-9-for-x86_64-baseos-eus-rpms
    kernel-modules.x86_64         5.14.0-427.50.1.el9_4    @rhel-9-for-x86_64-baseos-eus-rpms
    kernel-modules-core.x86_64    5.14.0-427.50.1.el9_4    @rhel-9-for-x86_64-baseos-eus-rpms
    kernel-tools.x86_64           5.14.0-427.50.1.el9_4    @rhel-9-for-x86_64-baseos-eus-rpms
    kernel-tools-libs.x86_64      5.14.0-427.50.1.el9_4    @rhel-9-for-x86_64-baseos-eus-rpms
    

    Note

    The output example displays the kernel version installed on RHEL 9.4. The output may vary depending on the RHEL version in use.

  3. Verify the available kernel packages that match the kernel version retrieved in Step 2.

    sudo dnf list --showduplicates | grep 5.14.0-427.50.1.el9_4
    
  4. Install the kernel development and header packages. Ensure they match the currently installed kernel version:

    sudo dnf install kernel-devel-5.14.0-427.50.1.el9_4.x86_64 kernel-devel-matched-5.14.0-427.50.1.el9_4.x86_64 kernel-headers-5.14.0-427.50.1.el9_4.x86_64
    
  5. Reboot the system:

    sudo reboot
    
  6. After 10 minutes, log in to your machine and update the software package manager (dnf) and kernels:

    sudo dnf update –y
    
  7. Install the wget utility:

    sudo dnf install -y wget
    
  8. Download the habanalabs-installer.sh script:

       sudo wget -nv https://vault.habana.ai/artifactory/gaudi-installer/1.20.1/habanalabs-installer.sh
       sudo chmod +x habanalabs-installer.sh
    
  9. Install Intel Gaudi driver and software:

    sudo ./habanalabs-installer.sh install --type base
    
  10. Click “Y” when you see Prepare installation to finish the installation - it will not complete until you confirm.

To verify that the installation is successful, follow the steps outlined in the System Verifications and Final Tests section.

Install Docker Engine and Gaudi Docker Container

The following steps are required only once to set up the Docker engine, Gaudi container runtime, and the Gaudi container itself.

Note

The steps below are based on the recommended method from the official Docker website.

  1. Remove the existing Docker packages:

    sudo dnf remove docker \
                  docker-client \
                  docker-client-latest \
                  docker-common \
                  docker-latest \
                  docker-latest-logrotate \
                  docker-logrotate \
                  docker-engine \
                  podman \
                  runc
    
  2. Install the dnf-plugins-core package:

    sudo dnf -y install dnf-plugins-core
    
  3. Add the Docker repository in the config-manager:

    sudo dnf config-manager --add-repo https://download.docker.com/linux/rhel/docker-ce.repo
    
  4. Install the Docker packages:

    sudo dnf install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
    
  5. Start the Docker engine:

    sudo systemctl enable --now docker
    
  6. Verify that the installation is successful by running the hello-world image:

    sudo docker run hello-world
    
  7. Install habanalabs-container-runtime package. This package is required for running workloads in containers. Both Docker and Kubernetes are supported:

    sudo dnf install -y habanalabs-container-runtime
    
  8. Register habana-container-runtime by adding the following to etc/docker/daemon.json as mentioned in the Docker Installation section:

     sudo tee /etc/docker/daemon.json <<EOF
    {
    "runtimes": {
        "habana": {
                "path": "/usr/bin/habana-container-runtime",
                "runtimeArgs": []
        }
    }
    }
    EOF
    
  9. Add your user to the Docker group:

    sudo usermod -aG docker $USER
    
  10. Reload daemon:

    sudo systemctl daemon-reload
    
  11. Restart the Docker service:

    sudo systemctl restart docker
    
  12. Install the latest Gaudi Docker container:

    sudo docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none \
    -v /run/:/run -v /dev/shm:/dev/shm --cap-add=sys_nice --net=host --ipc=host \
    vault.habana.ai/gaudi-docker/1.20.1/rhel9.4/habanalabs/pytorch-installer-2.6.0:latest
    

    The no hostkeys available message can be safely ignored. You should now be inside the container.

Run Inference

After setting up the Docker container in the previous section, follow the instructions below to run inference workloads inside the Docker container terminal. The examples below demonstrate how to run inference on a single card using the Optimum for Intel Gaudi library, and on multiple cards using the vLLM Inference Server with the Llama 3.2-1B and Granite 34B-code-instruct-8K models, respectively. For additional models tested on IBM with Gaudi 3, refer to the IBM FM Benchmarking Framework GitHub repository.