Python Package (habana_frameworks.torch)¶
This package provides PyTorch bridge interfaces and modules such as optimizers, mixed precision configuration, fused kernels for training on HPU and so on.
The various modules are organized as listed in the below example:
habana_frameworks.torch
core
distributed
hccl
hpex
hmp
kernels
normalization
optimizers
hpu
utils
The following sections provided a brief description of each module.
core¶
core
module provides Python bindings to PyTorch-Habana bridge interfaces.
For example, mark_step
which is used to trigger execution of accumulated graphs in Lazy mode.
distributed/hccl¶
distributed/hccl
module registers and adds support for HCCL communication backend.
hpex/hmp¶
hpex/hmp
module contains the habana_mixed_precision
(hmp) tool which can be used to train a model in mixed
precision on HPU. Refer to PyTorch Mixed Precision Training on Gaudi for further details.
hpex/kernels¶
hpex/kernels
module contains Python interfaces to Habana only custom operators, such as EmbeddingBag and EmbeddingBagPreProc
operators.
hpex/normalization¶
hpex/normalization
module contains Python interfaces to the Habana implementation for common normalize & clip operations performed
on gradients in some models. Usage of Habana provided implementation can provide better performance (compared to
equivalent operator provided in torch). Refer to Other Custom OPs for further details.
hpex/optimizers¶
hpex/optimizers
contains Python interfaces to Habana implementation for some of the common optimizers used in DL models.
Usage of Habana implementation can provide better performance (compared to corresponding optimizer implementations
available in torch). Refer to Custom Optimizers for further details.
hpu APIs¶
Support for HPU tensors is provided with this package. The following APIs provide the same functionality as CPU tensors but HPU is used for the underlying implementation. This package can be imported on demand.
import habana_frameworks.torch.hpu as hthpu
- Imports the package.hthpu.is_available()
- Returns a boolean indicating if a HPU device is currently available.hthpu.device_count()
- Returns the number of compute-capable devices.hthpu.get_device_name()
- Returns the name of the HPU device.hthpu.current_device()
- Returns the index of the current selected HPU device.
utils¶
utils
module contains general Python utilities required for training on HPU.
Memory Stats APIs¶
TORCH.HPU.MAX_MEMORY_ALLOCATED
- Returns peak HPU memory allocated by tensors (in bytes). reset_peak_memory_stats() can be used to reset the starting point in tracing stats.TORCH.HPU.MEMORY_ALLOCATED
- Returns the current HPU memory occupied by tensors.TORCH.HPU.MEMORY_STATS
- Returns list of HPU memory statics. The below summarizes the sample memory stats printout and details:Limit - Amount of total memory on HPU device
InUse - Amount of allocated memory at any instance. Starting point after reset_peak_memroy_stats()
MaxInUse - Amount of total active memory allocated
NumAllocs - Number of allocations
NumFrees - Number of freed chunks
ActiveAllocs - Number of active allocations
MaxAllocSize - Maximum allocated size
TotalSystemAllocs - Total number of system allocations
TotalSystemFrees - Total number of system frees
TotalActiveAllocs - Total number of active allocations
TORCH.HPU.MEMORY_SUMMARY
- Returns human readable printout of current memory stats.TORCH.HPU.RESET_ACCUMULATED_MEMORY_STATS
- This API to clear the no of allocs and no of frees.TORCH.HPU.RESET_PEAK_MEMORY_STATS
- Resets starting point of memory occupied by tensors.
The below shows a usage example:
import torch
import habana_frameworks.torch as htorch
device = torch.device("hpu")
import torch.nn as nn
import torch.nn.functional as F
if __name__ == '__main__':
hpu = torch.device('hpu')
cpu = torch.device('cpu')
input1 = torch.randn((64,28,28,20),dtype=torch.float, requires_grad=True)
input1_hpu = input1.contiguous(memory_format=torch.channels_last).to(hpu)
mem_summary1 = htorch.hpu.memory_summary()
print('memory_summary1:')
print(mem_summary1)
htorch.hpu.reset_peak_memory_stats()
input2 = torch.randn((64,28,28,20),dtype=torch.float, requires_grad=True)
input2_hpu = input2.contiguous(memory_format=torch.channels_last).to(hpu)
mem_summary2 = htorch.hpu.memory_summary()
print('memory_summary2:')
print(mem_summary2)
mem_allocated = htorch.hpu.memory_allocated()
print('memory_allocated: ', mem_allocated)
mem_stats = htorch.hpu.memory_stats()
print('memory_stats:')
print(mem_stats)
max_mem_allocated = htorch.hpu.max_memory_allocated()
print('max_memory_allocated: ', max_mem_allocated)
Random Number Generator APIs¶
torch.hpu.random.get_rng_state
- Returns the random number generator state of the specified HPU as a ByteTensortorch.hpu.random.get_rng_state_all
- Returns a list of ByteTensor representing the random number states of all devices.torch.hpu.random.set_rng_state
- Sets the random number generator state of the specified HPUtorch.hpu.random.set_rng_state_all
- Sets the random number generator state of all devices.torch.hpu.random.manual_seed
- Sets the seed for generating random numbers for the current HPU device.torch.hpu.random.manual_seed_all
- Sets the seed for generating random numbers on all HPUs.torch.hpu.random.seed
- Sets the seed for geenrating random numbers to a random number for the current HPUtorch.hpu.random.seed_all
- Sets the seed for generating random numbers to a random number on all HPUs.torch.hpu.random.initial_seed
- Returns the current random seed of the current HPU.