Getting Started with Gaudi

If you’re new to Habana and are looking for basic information, this page is a good place to start. Here you will find guidance on the typical activities that many users will want to do with Gaudi:

  • Learn about Gaudi and run existing models

  • Migrate your own model to run on Gaudi

  • Install the SW stack and Frameworks

  • Learn how to use the Gaudi DL1 instance on AWS

  • Learn about debugging, optimization, and profiling

  • Find where to get help from Habana and the community

Learn about Gaudi and Run Models

Habana provides a full GitHub Repository of models as well as some examples in the Tutorials section of our developer website for reference. Once you have access to a Gaudi instance, running the Models, Tutorials or simple PyTorch or TensorFlow examples are a great way to get familiar with Gaudi. Habana also provides several Webinar training sessions that cover the basics of model migration on PyTorch and TensorFlow.

Migrate your Model to Run on Gaudi

If you have an existing model and would like to migrate it over to Gaudi, several steps should be considered:

  • Model Migration - adding Gaudi module and libraries

  • Mixed Precision - support for BF16 and other data types for performance

  • Ops placement - ensuring that the model is running on Gaudi instead of CPU for best performance

  • Distributed training - setting up your model on a full Node (eight Gaudi) or scale out to Multiple Nodes

The first place to start are the Migration Guides for PyTorch or TensorFlow. These documents will show where to add the specific sections of code in your model to allow the Gaudi processor to be recognized by the Framework and run OPS on Gaudi. Additionally, there are multiple Videos on the developer website that will show you how to execute the model migration. This is first step to getting a model to be functional. Then, you can refer to PyTorch and TensorFlow to add Mixed Precision, OPS placement and distributed training to your model.

Install the SW Stack and Frameworks

In most cases, users will access Gaudi in two ways:

  • Creating or accessing an instance using a Cloud Service Provider

  • Using a local On-Premise system

These cases may require some setup, and in many cases, the environment may already be pre-configured, and may only need an additional docker container to run the environment. The Installation Guide will show the steps required for Cloud and On-Premise options.

How to use the DL1 Instance on AWS

Amazon Web Services’ Elastic Cloud Computing (EC2) offers the DL1 instance for access to the Gaudi Processor. For basic setup of the DL1 instance, start with the Quick Start Guide to set up the instance properly.

Debug, Optimization, and Profiling

If you are looking for tips on debugging and tuning your model, the PyTorch Optimization and PyTorch Debugging guides, and the TensorFlow Optimization and TensorFlow Debugging sections provide important details on debug and optimization, setting up logging, and key learnings to ensure that your model is performant.

For Profiling, the user has two options:

  • Using TensorBoard for model level profiling

  • Using the Habana Labs Trace Viewer (HLTV) for low level debug at the Gaudi card. To set up the profilers, refer to Profiling User Guide.

Where to get help from Habana and the Community

Need help solving a problem or getting support? Go to the Habana User Forum to ask questions to Habana engineers, review previous posts and engage with the development community.