# Profiling with Pytorch¶

Habana provides PyTorch users with near GPU experience when it comes to profiling their models. Just substitute HPU for GPU in your source code and that will take care of your migration from GPU to HPU.

The easiest way to figure out what you need to do to profile your model while training is by taking a glance at a code example:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 import torch from torch import nn from torch.utils.data import DataLoader from torchvision import datasets from torchvision.transforms import ToTensor import shutil shutil.rmtree('runs', True) #hpu specific from habana_frameworks.torch.utils.library_loader import load_habana_module load_habana_module() habana_device = torch.device("hpu") #general class NeuralNetwork(nn.Module): def __init__(self): super(NeuralNetwork, self).__init__() self.flatten = nn.Flatten() self.linear_relu_stack = nn.Sequential( nn.Linear(28*28, 512), nn.ReLU(), nn.Linear(512, 512), nn.ReLU(), nn.Linear(512, 10) ) def forward(self, x): x = self.flatten(x) logits = self.linear_relu_stack(x) return logits model = NeuralNetwork().to(habana_device) model.train() loss_fn = nn.CrossEntropyLoss() optimizer = torch.optim.SGD(model.parameters(), lr=1e-3) training_data = datasets.FashionMNIST( root="data", train=True, download=True, transform=ToTensor(), ) train_dataloader = DataLoader(training_data, batch_size=64) activities = [] activities.append(torch.profiler.ProfilerActivity.CPU) #hpu specific activities.append(torch.profiler.ProfilerActivity.HPU) #general with torch.profiler.profile( activities=activities, schedule=torch.profiler.schedule(wait=0, warmup=20, active=5, repeat=1), on_trace_ready=torch.profiler.tensorboard_trace_handler('./runs/fashion_mnist_experiment_1/'), record_shapes=True, with_stack=True) as prof: for batch, (X, y) in enumerate(train_dataloader): X, y = X.to(habana_device), y.to(habana_device) pred = model(X) loss = loss_fn(pred, y) optimizer.zero_grad() loss.backward() optimizer.step() prof.step() 

Note that in the example above, the data is collected from step 21 to 25. Take limited capacity of your buffer (which collects the data in the SynapseAI Profiling Subsystem) into consideration.

1. Start the TensorBoard server in a dedicated terminal window:

\$ tensorboard --logdir logs --bind_all --port=5990


In the example above, the listening port is set to 5990.

1. Open new window tab in your browser and check out your TensorBoard website:

http://fq_domain_name:5990


The TensorBoard generates two kinds of information:

• While your workload is being processed step by step (batch by batch), on the dashboard, you can monitor (online) the training process by tracking your model cost (loss) and accuracy.

• Right after the last requested step was completed, the whole bunch of collected profiling data is analyzed (by TensorFlow) and submitted to your browser. No need to wait for the end of the training process.

Note

• Carefully consider the number of steps you really need to profile and think of limited buffer size.

• If needed, for buffer extension consult section SynapseAI Profiler User Guide.

• For vast majority of use cases, default settings are just good enough so that no special internal parameter adjustment is needed.