Getting Started with Inference on Intel Gaudi
On this Page
Getting Started with Inference on Intel Gaudi¶
This guide provides simple steps for preparing a PyTorch model to run inference on Intel® Gaudi® AI accelerator.
Make sure to install the PyTorch packages provided by Intel Gaudi. To set up the PyTorch environment, refer to the Installation Guide. The supported PyTorch versions are listed in the Support Matrix.
Once you are ready to migrate PyTorch models from GPU-based architecture to Gaudi, you can use the GPU Migration Toolkit. The GPU Migration toolkit automates the process of migration by replacing all Python API calls that have dependencies on GPU libraries with Gaudi-specific API calls, allowing you to run your model with minimal modifications.
Note
Installing public PyTorch packages is not supported.
Refer to the PyTorch Known Issues and Limitations section for a list of current limitations.
Creating a Simple Inference Example Using model.eval
¶
The following sections provide two inference examples using Eager mode with torch.compile
and Lazy mode.
For further details, refer to PyTorch Gaudi Theory of Operations.
Example with Eager Mode and torch.compile
¶
Follow the below steps to run an inference example:
Download the pre-trained weights for the model:
Create a file named
example_inference.py
with the code below:
The example_inference.py
presents a basic PyTorch code example with torch.compile
. The Intel Gaudi-specific lines are explained below:
Line 9 - Import
habana_frameworks.torch.core
:Line 30 - Target the Gaudi device for model execution:
Line 32 - Wrap the model in
torch.compile
function and set the backend tohpu_backend
:Lines 45 and 46 - Target the Gaudi device for dataloader and label:
Executing the Example¶
After creating the example_inference.py
, perform the following:
Set PYTHON to Python executable:
Note
The Python version depends on the operating system. Refer to the Support Matrix for a full list of supported operating systems and Python versions.
Execute the
example_inference.py
by running it with the following environment variables:To use Eager mode, the
PT_HPU_LAZY_MODE=0
environment variable must be set since Lazy mode is the default mode. Refer to Runtime Environment Variables for more information.
Example with Lazy Mode¶
Follow the below steps to run an inference example:
Download the pre-trained weights for the model:
Create a file named
example_inference_lazy.py
with the code below:
The example_inference_lazy.py
presents a basic PyTorch code example. The Intel Gaudi-specific lines are explained below:
Line 5 - Import
habana_frameworks.torch.core
:Line 30 - Target the Gaudi device for the model execution:
Line 43 – Target the Gaudi device for dataloader:
Line 45 - In Lazy mode,
htcore.mark_step()
must be added after the inference stepoutput = model(data)
:
Executing the Example¶
After creating the example_inference_lazy.py
, perform the following:
Set PYTHON to Python executable:
Note
The Python version depends on the operating system. Refer to the Support Matrix for a full list of supported operating systems and Python versions.
Execute the
example_inference_lazy.py
by running:
Using torch.jit.trace
Mode¶
Load and save models in JIT trace format using torch.jit.trace
mode. For further details on JIT format, refer to TORCHSCRIPT page.
Create the model and move it to the HPU:
device = torch.device('hpu') in_c, out_c = 3, 64 k_size = 7 stride = 2 conv = torch.nn.Conv2d(in_c, out_c, kernel_size=k_size, stride=stride, bias=True) bn = torch.nn.BatchNorm2d(out_c, eps=0.001) relu = torch.nn.ReLU() model = torch.nn.Sequential(conv, bn, relu) model.eval() model = model.to(device)
Save the model using
torch.jit.trace
:Load the model using
torch.jit.load
:Create the inputs and move them to the HPU to run model inference:
Note
JIT format is functionally correct but not yet optimized. This will be supported in a future release.