Intel Gaudi Media Loader
On this Page
Intel Gaudi Media Loader¶
The habana_media_loader
is used to dataload and pre-process inputs for deep learning frameworks.
It consists of pre-enabled dataloaders for commonly used datasets and building blocks to assemble a generic dataloader.
The loader decides internally if part of the operations can be offloaded to the Intel® Gaudi® AI accelerator. If the offload cannot be executed, it uses
an alternative passing dataloader function to run.
habana_media_loader
can operate in different modes. The optimal one is selected based on the underlying hardware:
In Intel® Gaudi® 2 AI accelerator, the dataloader uses hardware-based decoders for acceleration, lowering the load on the host CPU.
In first-gen Intel® Gaudi® AI accelerator, it uses either the default PyTorch Dataloader or
habana_dataloader
, depending on the use case. Both are done on the host CPU.
Setting Up the Environment¶
To install habana_media_loader
, run the following command:
pip install habana_media_loader-1.20.0-543-py3-none-any.whl
Note
The above step is not required when running Intel Gaudi Docker image as habana_media_loader
is already installed by default.
Using Media Loader with PyTorch¶
Follow the steps below to import habana_dataloader
object. For the full example,
refer to Torchvision model.
Import
habana_dataloader
object.import habana_dataloader
Create an instance of
habana_dataloader
object.habana_dataloader.HabanaDataLoader( dataset, batch_size=args.batch_size, sampler=train_sampler, num_workers=args.workers, pin_memory=True, drop_last=True)
The Intel Gaudi software selects the dataloader based on the underlying hardware and the dataset used:
In Gaudi 2, it uses hardware acceleration for ImageNet, COCO and Medical Segmentation Decathlon (BraTS) datasets.
In first-gen Gaudi, it uses software acceleration for ImageNet and COCO datasets.
Fallback¶
In Gaudi 2 - When the provided input parameters are not eligible for hardware acceleration (see Guidelines for Supported Datasets), the software accelerated
habana_dataloader
is activated. In such a case, the following message will be printed:Failed to initialize Habana media Dataloader, error: {error message} Fallback to aeon dataloader
In first-gen Gaudi - When the provided input parameters are not eligible for
habana_dataloader
(see Guidelines for Supported Datasets), the default PyTorch Dataloader is initialized and used. In such a case, the following message will be printed:Failed to initialize Habana Dataloader, error: {error message} Running with PyTorch Dataloader
Guidelines for Supported Datasets¶
Note
Starting from v1.20.0, support for the SSD model has been deprecated.
The following lists the restrictions for the supported datasets using Gaudi 2:
Acceleration takes place only with the following parameters:
prefetch_factor=3
drop_last=False
dataset is torchvision.datasets.ImageFolder
The dataset should contain only .jpg or .jpeg files.
Acceleration can take place only with the following dataset torchvision transforms packed as
transforms.Compose
in Torchvision script/train.py.RandomResizedCrop
CenterCrop
Resize
ToTensor
RandomHorizontalFlip, only with p=0.5
Normalize, only with mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
Acceleration takes place only with the following parameters:
dataset is an instance of COCODetection. See SSD script/train.py.
drop_last=False
prefetch_factor=3
The dataset should be taken from the COCO Dataset webpage.
Acceleration can take place only with the following dataset transforms packed as
transforms.Compose
in SSD script/utils.py.SSDCropping. See SSD script/train.py.
Resize
ColorJitter, only with brightness=0.125, contrast=0.5, saturation=0.5, hue=0.05
ToTensor
RandomHorizontalFlip, only with p=0.5
Normalize, only with mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
Encoder. See SSD script/train.py.
Acceleration takes place only with the following parameters:
num_workers is 3 for Unet2D and 5 for Unet3D
Unet2D - Only nvol 1 is supported
Unet3D - Only batchsize 2 is supported
val_dataloader, test_dataloader are not supported by
habana_media_loader
The dataset should be preprocessed as per script/preprocess.py.
Acceleration can take place only with the following dataset transforms:
Crop
Flip
Noise - Standard deviation range supported (0, 0.33)
Blur - Sigma’s range supported (0.5, 1.5)
Brightness - Brightness scale supported (0.7, 1.3)
Contrast - Contrast scale supported (0.65, 1.5)
Zoom transform is not supported
The following lists the restrictions for the supported datasets using first-gen Gaudi:
Acceleration takes place only with the following parameters:
batch_sampler=None
num_workers=8. See Torchvision script/train.py.
collate_fn=None
pin_memory=True. See Torchvision script/train.py.
timeout=0
shuffle=False. See Torchvision script/train.py.
worker_init_fn=None
multiprocessing_context=None
generator=None
prefetch_factor=2
Acceleration takes place only with the following parameters:
batch_sampler=None
num_workers=12. See SSD script/train.py.
pin_memory=True. See SSD script/train.py.
timeout=0
worker_init_fn=None
drop_last=True
prefetch_factor=2
Acceleration can take place only with the following dataset transforms packed as
transforms.Compose
in SSD script/utils.py.SSDCropping. See SSD script/train.py.
Resize
ColorJitter, only with brightness=0.125, contrast=0.5, saturation=0.5, hue=0.05
ToTensor
RandomHorizontalFlip, only with p=0.5
Normalize, only with mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
Encoder. See SSD script/train.py.
Model Examples¶
The following are full examples of models using Intel Gaudi Media Loader with PyTorch on the datasets mentioned above: