Profiling with Pytorch

Habana provides PyTorch users with a similar experience to GPU for model profiling.

The easiest way to figure out what you need to do to profile your model while training is by taking a glance at the below code example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import torch
import habana_frameworks.torch.core as htcore

activities = [torch.profiler.ProfilerActivity.CPU]

#CUDA:
#device = torch.device('cuda:0')
#activities.append(torch.profiler.ProfilerActivity.CUDA)

#HPU:
device = torch.device('hpu')
activities.append(torch.profiler.ProfilerActivity.HPU)

with torch.profiler.profile(
    schedule=torch.profiler.schedule(wait=0, warmup=20, active=5, repeat=1),
    activities=activities,
    on_trace_ready=torch.profiler.tensorboard_trace_handler('logs')) as profiler:
    for i in range(100):
        input = torch.tensor([[i]*10]*10, dtype=torch.float32, device=device)
        result = torch.matmul(input, input)
        result.to('cpu')
        htcore.mark_step()
        profiler.step()

Note that in the example above, the data is collected from step 21 to 25. Take limited capacity of your buffer (which collects the data in the SynapseAI Profiling Subsystem) into consideration.

  1. Start the TensorBoard server in a dedicated terminal window:

$ tensorboard --logdir logs --bind_all --port=5990

In the example above, the listening port is set to 5990.

  1. Open new window tab in your browser and check out your TensorBoard website:

http://fq_domain_name:5990

Now you are ready to go and start your training.

The TensorBoard generates two kinds of information:

  • While your workload is being processed step by step (batch by batch), on the dashboard, you can monitor (online) the training process by tracking your model cost (loss) and accuracy.

  • Right after the last requested step was completed, the whole bunch of collected profiling data is analyzed (by TensorBoard) and submitted to your browser. No need to wait for the end of the training process.

Note

  • Carefully consider the number of steps you really need to profile and think of limited buffer size.

  • If needed, for buffer extension consult section SynapseAI Profiler User Guide.

  • For vast majority of use cases, default settings are just good enough so that no special internal parameter adjustment is needed.