Habana Media Loader

The Habana Media Loader is used to dataload and pre-process inputs for deep learning frameworks. It consists of pre-enabled dataloaders for commonly used datasets and building blocks to assemble a generic dataloader. The loader decides internally if part of the operations can be offloaded to the Habana accelerator. If the offload cannot be executed, it will result in using the passing alternative dataloader function to run.

Habana Media Loader can operate in different modes. The optimal one is selected based on the underlying hardware:

  • In Gaudi2, the dataloader uses hardware-based decoders for acceleration, lowering the load on the host CPU.

  • In first-gen Gaudi, it uses either the framework default dataloader (PyTorch Dataloader) or Habana Dataloader, depending on the use case. Both are done on the host CPU.

Setting Up the Environment

To install Habana Media Loader, run the following command:

pip install hpu_media_loader-1.11.0-587-py3-none-any.whl

Note

If you are using Habana docker image, skip this step as Habana Media Loader is pre-installed.

Using Media Loader with PyTorch

Follow the steps below to import Habana Dataloader Object. For the full example, refer to Torchvision model.

  1. Import Habana Dataloader Object.

import habana_dataloader
  1. Create an instance of Habana Dataloader Object.

habana_dataloader.HabanaDataLoader(
    dataset, batch_size=args.batch_size, sampler=train_sampler,
    num_workers=args.workers, pin_memory=True, drop_last=True)

The SynapseAI software selects the dataloader based on the underlying hardware and the dataset used:

  • In Gaudi2, it uses hardware acceleration for ImageNet, COCO and Medical Segmentation Decathlon (BraTS) datasets.

  • In first-gen Gaudi, it uses software acceleration for ImageNet and COCO datasets.

Fallback

  • In Gaudi2 - When the provided input parameters are not eligible for hardware acceleration (see Guidelines for Supported Datasets), the software accelerated Habana Dataloader is activated. In such a case, the following message will be printed:

Failed to initialize Habana media Dataloader, error: {error message}
Fallback to aeon dataloader
  • In first-gen Gaudi - When the provided input parameters are not eligible for Habana Dataloader (see Guidelines for Supported Datasets), the framework default dataloader (PyTorch Dataloader) is initialized and used. In such a case, the following message will be printed:

Failed to initialize Habana Dataloader, error: {error message}
Running with PyTorch Dataloader

Guidelines for Supported Datasets

The following lists the restrictions for the supported datasets using Gaudi2:

  • Acceleration takes place only with the following parameters:

    • prefetch_factor=3

    • drop_last=False

    • dataset is torchvision.datasets.ImageFolder

  • The dataset should contain only .jpg or .jpeg files.

  • Acceleration can take place only with the following dataset torchvision transforms packed as transforms.Compose in Torchvision script/train.py.

    • RandomResizedCrop

    • CenterCrop

    • Resize

    • ToTensor

    • RandomHorizontalFlip, only with p=0.5

    • Normalize, only with mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]

  • Acceleration takes place only with the following parameters:

    • dataset is an instance of COCODetection. See SSD script/train.py.

    • drop_last=False

    • prefetch_factor=3

  • The dataset should be taken from the COCO Dataset webpage.

  • Acceleration can take place only with the following dataset transforms packed as transforms.Compose in SSD script/utils.py.

    • SSDCropping. See SSD script/train.py.

    • Resize

    • ColorJitter, only with brightness=0.125, contrast=0.5, saturation=0.5, hue=0.05

    • ToTensor

    • RandomHorizontalFlip, only with p=0.5

    • Normalize, only with mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]

    • Encoder. See SSD script/train.py.

  • Acceleration takes place only with the following parameters:

    • num_workers is 3 for Unet2D and 5 for Unet3D

    • Unet2D - Only nvol 1 is supported

    • Unet3D - Only batchsize 2 is supported

    • val_dataloader, test_dataloader are not supported by Habana Media Loader

  • The dataset should be preprocessed as per script/preprocess.py.

  • Acceleration can take place only with the following dataset transforms:

    • Crop

    • Flip

    • Noise - Standard deviation range supported (0, 0.33)

    • Blur - Sigma’s range supported (0.5, 1.5)

    • Brightness - Brightness scale supported (0.7, 1.3)

    • Contrast - Contrast scale supported (0.65, 1.5)

    • Zoom transform is not supported

The following lists the restrictions for the supported datasets using first-gen Gaudi:

  • Acceleration takes place only with the following parameters:

  • Acceleration can take place only with the following dataset transforms packed as transforms.Compose in SSD script/utils.py.

    • SSDCropping. See SSD script/train.py.

    • Resize

    • ColorJitter, only with brightness=0.125, contrast=0.5, saturation=0.5, hue=0.05

    • ToTensor

    • RandomHorizontalFlip, only with p=0.5

    • Normalize, only with mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]

    • Encoder. See SSD script/train.py.

Model Examples

The following are full examples of models using Habana Media Loader with PyTorch on the datasets mentioned above: