Habana Media Loader
On this Page
Habana Media Loader¶
The Habana Media Loader is used to dataload and pre-process inputs for deep learning frameworks. It consists of pre-enabled dataloaders for commonly used datasets and building blocks to assemble a generic dataloader. The loader decides internally if part of the operations can be offloaded to the Habana accelerator. If the offload cannot be executed, it will result in using the passing alternative dataloader function to run.
Habana Media Loader can operate in different modes. The optimal one is selected based on the underlying hardware:
In Gaudi2, the dataloader uses hardware-based decoders for acceleration, lowering the load on the host CPU.
In first-gen Gaudi, it uses either the framework default dataloader (PyTorch Dataloader) or Habana Dataloader, depending on the use case. Both are done on the host CPU.
Setting Up the Environment¶
To install Habana Media Loader, run the following command:
pip install hpu_media_loader-1.11.0-587-py3-none-any.whl
Note
If you are using Habana docker image, skip this step as Habana Media Loader is pre-installed.
Using Media Loader with PyTorch¶
Follow the steps below to import Habana Dataloader Object. For the full example, refer to Torchvision model.
Import Habana Dataloader Object.
import habana_dataloader
Create an instance of Habana Dataloader Object.
habana_dataloader.HabanaDataLoader(
dataset, batch_size=args.batch_size, sampler=train_sampler,
num_workers=args.workers, pin_memory=True, drop_last=True)
The SynapseAI software selects the dataloader based on the underlying hardware and the dataset used:
In Gaudi2, it uses hardware acceleration for ImageNet, COCO and Medical Segmentation Decathlon (BraTS) datasets.
In first-gen Gaudi, it uses software acceleration for ImageNet and COCO datasets.
Fallback¶
In Gaudi2 - When the provided input parameters are not eligible for hardware acceleration (see Guidelines for Supported Datasets), the software accelerated Habana Dataloader is activated. In such a case, the following message will be printed:
Failed to initialize Habana media Dataloader, error: {error message}
Fallback to aeon dataloader
In first-gen Gaudi - When the provided input parameters are not eligible for Habana Dataloader (see Guidelines for Supported Datasets), the framework default dataloader (PyTorch Dataloader) is initialized and used. In such a case, the following message will be printed:
Failed to initialize Habana Dataloader, error: {error message}
Running with PyTorch Dataloader
Guidelines for Supported Datasets¶
The following lists the restrictions for the supported datasets using Gaudi2:
Acceleration takes place only with the following parameters:
prefetch_factor=3
drop_last=False
dataset is torchvision.datasets.ImageFolder
The dataset should contain only .jpg or .jpeg files.
Acceleration can take place only with the following dataset torchvision transforms packed as
transforms.Compose
in Torchvision script/train.py.RandomResizedCrop
CenterCrop
Resize
ToTensor
RandomHorizontalFlip, only with p=0.5
Normalize, only with mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
Acceleration takes place only with the following parameters:
dataset is an instance of COCODetection. See SSD script/train.py.
drop_last=False
prefetch_factor=3
The dataset should be taken from the COCO Dataset webpage.
Acceleration can take place only with the following dataset transforms packed as
transforms.Compose
in SSD script/utils.py.SSDCropping. See SSD script/train.py.
Resize
ColorJitter, only with brightness=0.125, contrast=0.5, saturation=0.5, hue=0.05
ToTensor
RandomHorizontalFlip, only with p=0.5
Normalize, only with mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
Encoder. See SSD script/train.py.
Acceleration takes place only with the following parameters:
num_workers is 3 for Unet2D and 5 for Unet3D
Unet2D - Only nvol 1 is supported
Unet3D - Only batchsize 2 is supported
val_dataloader, test_dataloader are not supported by Habana Media Loader
The dataset should be preprocessed as per script/preprocess.py.
Acceleration can take place only with the following dataset transforms:
Crop
Flip
Noise - Standard deviation range supported (0, 0.33)
Blur - Sigma’s range supported (0.5, 1.5)
Brightness - Brightness scale supported (0.7, 1.3)
Contrast - Contrast scale supported (0.65, 1.5)
Zoom transform is not supported
The following lists the restrictions for the supported datasets using first-gen Gaudi:
Acceleration takes place only with the following parameters:
batch_sampler=None
num_workers=8. See Torchvision script/train.py.
collate_fn=None
pin_memory=True. See Torchvision script/train.py.
timeout=0
shuffle=False. See Torchvision script/train.py.
worker_init_fn=None
multiprocessing_context=None
generator=None
prefetch_factor=2
Acceleration takes place only with the following parameters:
batch_sampler=None
num_workers=12. See SSD script/train.py.
pin_memory=True. See SSD script/train.py.
timeout=0
worker_init_fn=None
drop_last=True
prefetch_factor=2
Acceleration can take place only with the following dataset transforms packed as
transforms.Compose
in SSD script/utils.py.SSDCropping. See SSD script/train.py.
Resize
ColorJitter, only with brightness=0.125, contrast=0.5, saturation=0.5, hue=0.05
ToTensor
RandomHorizontalFlip, only with p=0.5
Normalize, only with mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
Encoder. See SSD script/train.py.
Model Examples¶
The following are full examples of models using Habana Media Loader with PyTorch on the datasets mentioned above: