Running Workloads on Docker

Before you start, make sure to follow the instructions in the Installation Guide and On-Premise System Update.

Start Training a PyTorch Model on Gaudi

  1. Run the Intel Gaudi Docker image:

    DOCKER_OPTS="-e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host"
    docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all $DOCKER_OPTS vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
    
  2. Clone the Model References repository inside the container that you have just started:

    git clone https://github.com/HabanaAI/Model-References.git
    
  3. Move to the subdirectory containing the hello_world example which presents a basic PyTorch code example:

    cd Model-References/PyTorch/examples/computer_vision/hello_world/
    
  4. Update PYTHONPATH to include Model References repository and set PYTHON to Python executable:

    export GC_KERNEL_PATH=/usr/lib/habanalabs/libtpc_kernels.so
    export PYTHONPATH=$PYTHONPATH:Model-References
    export PYTHON=/usr/bin/python3.10
    

    Note

    The Python version depends on the operating system. Refer to the Support Matrix for a full list of supported operating systems and Python versions.

Training Examples

Next Steps

For next steps you can refer to the following:

  • To explore more models from the Model References, start here.

  • To run more examples using Hugging Face go here.

  • To migrate other models to Gaudi, refer to PyTorch Model Porting.