Getting Started with Inference on Intel Gaudi
On this Page
Getting Started with Inference on Intel Gaudi¶
This guide provides easy steps for preparing a simple PyTorch model for inference on Intel® Gaudi® AI accelerator.
Make sure to install the PyTorch packages provided by Intel Gaudi. Installing public PyTorch packages is not supported. To set up the PyTorch environment, refer to the Installation Guide.The supported PyTorch versions are listed in the Support Matrix.
Note
Refer to the PyTorch Known Issues and Limitations section for a list of current limitations.
Once you are ready to migrate PyTorch models that run on GPU-based architecture to run on Gaudi, you can use GPU Migration Toolkit. The GPU Migration Tool automates the process of migration by replacing all Python API calls that have dependencies on GPU libraries with Gaudi-specific API calls, so you can run your model with fewer modifications.
Creating an Inference Example Using model.eval
Mode¶
The below example contains the highlighted Intel Gaudi-specific modifications that have been added to the PyTorch Hello World inference example.
First, download the pre-trained weights for the model:
wget https://vault.habana.ai/artifactory/misc/inference/mnist/mnist-epoch_20.pth
Create a file named example_inference.py
with the code below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | import os
import sys
import torch
import time
import habana_frameworks.torch.core as htcore
from torch.utils.data import DataLoader
from torchvision import transforms, datasets
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(784, 256)
self.fc2 = nn.Linear(256, 64)
self.fc3 = nn.Linear(64, 10)
def forward(self, x):
out = x.view(-1,28*28)
out = F.relu(self.fc1(out))
out = F.relu(self.fc2(out))
out = self.fc3(out)
out = F.log_softmax(out, dim=1)
return out
model = Net()
checkpoint = torch.load('mnist-epoch_20.pth')
model.load_state_dict(checkpoint)
model = model.eval()
model = model.to("hpu")
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))])
data_path = './data'
test_kwargs = {'batch_size': 32}
dataset1 = datasets.MNIST(data_path, train=False, download=True, transform=transform)
test_loader = torch.utils.data.DataLoader(dataset1,**test_kwargs)
correct = 0
for batch_idx, (data, label) in enumerate(test_loader):
data = data.to("hpu")
output = model(data)
htcore.mark_step()
correct += output.max(1)[1].eq(label).sum()
print('Accuracy: {:.2f}%'.format(100. * correct / (len(test_loader) * 32)))
|
The example_inference.py
presents a basic PyTorch code example. The Intel Gaudi-specific lines are explained below.
Line 5 - Import
habana_frameworks.torch.core
:
import habana_frameworks.torch.core as htcore
Line 30 - Target the Gaudi device for the model execution:
model = model.to("hpu")
Line 43 – Make sure the dataloader uses Gaudi:
data = data.to("hpu")
Lines 45 - In Lazy mode,
htcore.mark_step()
must be added after the inference stepoutput = model(data)
.
htcore.mark_step()
htcore.mark_step()
indicates the end of the inference loop so that graph accumulation can be stopped.
If htcore.mark_step()
is not invoked, all inference loops merge into one graph, increasing memory consumption and preventing host and device time overlap.
Using torch.jit.trace
Mode¶
Load and save models in JIT trace format using torch.jit.trace
mode. For further details on JIT format, refer to TORCHSCRIPT page.
Create the model and move it to the HPU:
device = torch.device('hpu')
in_c, out_c = 3, 64
k_size = 7
stride = 2
conv = torch.nn.Conv2d(in_c, out_c, kernel_size=k_size, stride=stride, bias=True)
bn = torch.nn.BatchNorm2d(out_c, eps=0.001)
relu = torch.nn.ReLU()
model = torch.nn.Sequential(conv, bn, relu)
model.eval()
model = model.to(device)
Save the model using
torch.jit.trace
:
N, H, W = 256, 224, 224
model_input = torch.randn((N,in_c,H,W), dtype=torch.float).to(device)
with torch.no_grad():
trace_model = torch.jit.trace(model, (model_input), check_trace=False, strict=False)
# Save the HPU model with torch.jit.save.
trace_model.save("trace_model.pt")
Load the model using
torch.jit.load
:
# Load the model directly to HPU.
model = torch.jit.load("trace_model.pt", map_location=torch.device('hpu'))
Create the inputs and move them to the HPU to run model inference:
# Create inputs and move them to the HPU.
N, H, W = 256, 224, 224
input = torch.randn((N,in_c,H,W),dtype=torch.float)
input_hpu = input.to(device)
# Invoke the model.
output = model(input_hpu)
# In Lazy mode execution, mark_step() must be added after model inference.
htcore.mark_step()
Note
JIT format is functionally correct but not yet optimized. This will be supported in a future release.