IBM Cloud Quick Start Guide

This document provides instructions for setting up the Intel® Gaudi® 3 AI accelerator instance on the IBM Cloud®, installing Intel Gaudi driver and software, and running inference using the Optimum for Intel Gaudi library and the vLLM Inference Server.

Prerequisites

The IBM Cloud account should be set up. Follow the steps in the IBM Cloud documentation.

Create a Gaudi 3 Instance

Follow the below step-by-step instructions to launch a Gaudi 3 instance on the IBM Cloud.

System Setup

Note

The system is pre-installed with RHEL 9.4 OS. The following instructions are intended for RHEL 9.4 but also apply to RHEL 9.2.

Run Inference

After setting up the Docker container in the previous section, follow the instructions below to run inference workloads inside the Docker container terminal. The examples below demonstrate how to run inference on a single card using the Optimum for Intel Gaudi library, and on multiple cards using the vLLM Inference Server with the Llama 3.2-1B and Granite 34B-code-instruct-8K models, respectively. For additional models tested on IBM with Gaudi 3, refer to the IBM FM Benchmarking Framework GitHub repository.