Intel Tiber AI Cloud Quick Start Guide

This document provides instructions on setting up the Intel® Gaudi® 3 and Intel® Gaudi® 2 AI accelerator instances on the Intel® Tiber™ AI Cloud and running models from the Intel Gaudi Model References repository and the Hugging Face Optimum for Intel Gaudi library.

Please follow along with the video on our Developer Page to walk through the steps below.

Creating an Account and Getting an Instance

Follow the below steps to get access to the Intel Tiber AI Cloud and launch a Gaudi 3 or a Gaudi 2 card instance.

  1. Go to https://console.cloud.intel.com and select Get Started to create an account and get SSH access:

../_images/console-home-00.png
  1. Go to “Console Home” and select “Documentation”:

../_images/console-11.png
  1. Select “Guides”:

../_images/console-11-guides.png
  1. Create a Gaudi instance:

    Select “Preview Guide > Preview Catalog” (or go directly to https://console.cloud.intel.com/docs/guides/preview_cat.html) and follow the instructions:

    ../_images/console-13-PreviewCatalogGuide.png

    Select “Get Started” (or go directly to https://console.cloud.intel.com/docs/guides/get_started.html) and follow the instructions:

    ../_images/console-12-getStartedGuide.png
  2. To access tutorials, Jupyter Notebooks, and code samples, select “Tutorials”:

../_images/console-14-tutorials.png

Start Training a PyTorch Model on Gaudi

  1. Run the hl-smi tool to confirm the Intel Gaudi software version used on your Intel Tiber AI Cloud instance. You will need to use the correct software version in the docker run and git clone commands. Use the HL-SMI Version at the top. In this case, the version is 1.21.1:

       HL-SMI Version:       hl-1.21.1-XXXXXXX
       Driver Version:       1.21.1-XXXXXX
    
  2. Run the Intel Gaudi Docker image:

       docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.21.1/ubuntu22.04/habanalabs/pytorch-installer-2.6.0:latest
    

    Note

    • You may see this error message after running the above docker command:

      docker: permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Head "http://%2Fvar%2Frun%2Fdocker.sock/_ping": dial unix /var/run/docker.sock: connect: permission denied.
      

      To avoid this, add your user to the docker group:

      sudo usermod -a -G docker $USER
      

      Exit the terminal and re-enter it. Your original docker command should now work properly.

    • When starting the container in interactive mode, the following error message may appear. This error can be ignored when running on a single server.

      * Starting OpenBSD Secure Shell server
      sshd
      sshd: no hostkeys available -- exiting.
                                                  [fail]
      

      For multi-server setups, refer to the multi_node_setup section below for instructions on generating SSH host keys:

  3. Clone the Model References repository inside the container that you have just started:

       cd ~
       git clone -b 1.21.0 https://github.com/HabanaAI/Model-References.git
    
  4. Move to the subdirectory containing the hello_world example:

    cd Model-References/PyTorch/examples/computer_vision/hello_world/
    
  5. Update the environment variables to point to where the Model References repository is located and set PYTHON to Python executable:

    export PYTHONPATH=$PYTHONPATH:/root/Model-References
    export PYTHON=/usr/bin/python3.10
    

    Note

    The Python version depends on the operating system. Refer to the Support Matrix for a full list of supported operating systems and Python versions.

Training Examples

Next Steps

For next steps you can refer to the following:

  • To explore more models from the Model References, start here.

  • To run more examples using Hugging Face go here.

  • To migrate other models to Gaudi, refer to PyTorch Model Porting.