habana_frameworks.mediapipe.fn.ReadNumpyDatasetFromDir

Class:
  • habana_frameworks.mediapipe.fn.ReadNumpyDatasetFromDir(**kwargs)

Define graph call:
  • __call__()

Parameter:
  • None

Description:

This reader reads numpy data files and numpy label files either from given directory or file list and returns the batch of numpy images and labels.

Supported backend:
  • CPU

Keyword Arguments:

kwargs

Description

dir

Input image directory path for reading images and labels.

  • Type: str

  • Default: None

  • Optional: yes (Either provide dir or provide file_list)

file_list

Instead of providing dir (input image directory path), user can provide list of files to reader.

  • Type: str

  • Default: None

  • Optional: yes

  • Note: file_list should have full path of all the files. Example [“/path/to/4d/xyz_000_x.npy”, “/path/to/4d/xyz_000_y.npy”, …..]

seed

Seed for randomization, if not provided it will be generated internally. It is used for shuffling the dataset and it is also used for randomly selecting the images to pad the last batch when the drop_reminder is False and pad_reminder is False.

  • Type: int

  • Optional: yes

pattern

Pattern for searching file names with name and extension.

  • Type: str

  • Default: “xyz_*.npz”

  • Optional: no

shuffle

If shuffle_across_dataset is set to True this argument is neglected. If shuffle_across_dataset is set to False and shuffle is set to True, reader shuffles the dataset within the slice.

  • Type: bool

  • Default: True

  • Optional: yes

max_file

Full path of biggest input file. This is used for pre-allocating buffers. If not provided, reader will find it.

  • Type: str

  • Default: None

  • Optional: yes

  • Note: This feature is provided to save some time of reader, specially in case of bigger data-sets.

num_readers

Number of parallel reader threads to be used.

  • Type: int

  • Default: 1

  • Optional: yes

  • max numbers of reader limited to 8

drop_remainder

If True, reader will drop the partial batch. If False then padding mechanism can be controlled by pad_reminder.

  • Type: bool

  • Default: False

  • Optional: yes

pad_remainder

If True then reader will replicate last image of partial batch. If False then partial batch is filled with random images.

  • Type: bool

  • Default: False

  • Optional: yes

num_slices

It indicates number of cards in multi-card training. Before first epoch, input data will be divided into num_slices i.e. one slice for every card. During entire training, same slice will be used for that particular card for creating batches in every epoch. Default value is one, which indicates single card training.

  • Type: int

  • Default: 1

  • Optional: yes

slice_index

In multi-card training, it indicates index of card.

  • Type: int

  • Default: 0

  • Optional: yes (if num_slices=1 otherwise user must provide it)

  • Note: Default value is zero for single card training, for multi-card it must be between 0 and num_slices -1.

dense

It should be used only when all numpy files in a dataset are of same shape. If set to True reader will output dense tensor (tensor containing batch of equal size images)

  • Type: bool

  • Default: True

  • Optional: yes

shuffle_across_dataset

When shuffle_across_dataset set to True, the data is shuffled and sliced for each epoch.

  • Type: bool

  • Default: False

  • Optional: yes

Example1: Use ReadNumpyDatasetFromDir by providing input directory

The following code snippet shows numpy reader using directory input and pattern for file selection. All of “x.npy” are of the same shape and the same is true for “xyz_*_y.npy”. Refer to habana_frameworks.mediapipe.fn.Crop example for variable shape input.

from habana_frameworks.mediapipe import fn
from habana_frameworks.mediapipe.mediapipe import MediaPipe
from habana_frameworks.mediapipe.media_types import dtype as dt
import os


class myMediaPipe(MediaPipe):
    def __init__(self, device, queue_depth, batch_size, num_threads, op_device, dir, pattern):
        super(
            myMediaPipe,
            self).__init__(
            device,
            queue_depth,
            batch_size,
            num_threads,
            self.__class__.__name__)

        self.inputxy = fn.ReadNumpyDatasetFromDir(num_outputs=1,
                                                shuffle=False,
                                                dir=dir,
                                                pattern=pattern,
                                                dtype=dt.FLOAT32,
                                                device=op_device)

        self.memcopy_op = fn.MemCpy(dtype=dt.FLOAT32,
                                    device="hpu")

    def definegraph(self):
        img = self.inputxy()
        img = self.memcopy_op(img)
        return img


def run(device, op_device):
    batch_size = 2
    queue_depth = 1
    num_threads = 1
    base_dir = os.environ['DATASET_DIR']
    dir = base_dir+"/npy_data/fp32/"
    pattern = "*x*.npy"

    # Create MediaPipe object
    pipe = myMediaPipe(device, queue_depth, batch_size, num_threads,
                    op_device, dir, pattern)
    pipe.build()
    pipe.iter_init()
    for i in range(1):
        images = pipe.run()
        images = images.as_cpu().as_nparray()
        print('image shape: ', images.shape)
        print(images)
    del pipe


if __name__ == "__main__":
    dev_opdev = {'mixed': ['cpu'],
                'legacy': ['cpu']}
    for dev in dev_opdev.keys():
        for op_dev in dev_opdev[dev]:
            out = run(dev, op_dev)

The following is the output of Numpy reader using input directory:

image shape:  (2, 3, 2, 3)
[[[[182. 227. 113.]
  [175. 128. 253.]]

  [[ 58. 140. 136.]
  [ 86.  80. 111.]]

  [[175. 196. 178.]
  [ 20. 163. 108.]]]


[[[186. 254.  96.]
  [180.  64. 132.]]

  [[149.  50. 117.]
  [213.   6. 111.]]

  [[ 77.  11. 160.]
  [129. 102. 154.]]]]

Example2: Use ReadNumpyDatasetFromDir by providing file_list

The following code snippet shows numpy reader using file list input:

from habana_frameworks.mediapipe import fn
from habana_frameworks.mediapipe.mediapipe import MediaPipe
from habana_frameworks.mediapipe.media_types import dtype as dt
import os
import glob

class myMediaPipe(MediaPipe):
    def __init__(self, device, queue_depth, batch_size, num_threads, op_device, npy_x, npy_y):
        super(
            myMediaPipe,
            self).__init__(
            device,
            queue_depth,
            batch_size,
            num_threads,
            self.__class__.__name__)

        self.inputx = fn.ReadNumpyDatasetFromDir(num_outputs=1,
                                                shuffle=False,
                                                file_list=npy_x,
                                                dtype=dt.FLOAT32,
                                                device=op_device)

        self.inputy = fn.ReadNumpyDatasetFromDir(num_outputs=1,
                                                shuffle=False,
                                                file_list=npy_y,
                                                dtype=dt.UINT8,
                                                device=op_device)

        self.memcopy_op = fn.MemCpy(dtype=dt.FLOAT32,
                                    device="hpu")

    def definegraph(self):
        img = self.inputx()
        lbl = self.inputy()
        img = self.memcopy_op(img)
        return img, lbl


def run(device, op_device):
    batch_size = 2
    queue_depth = 1
    num_threads = 1
    base_dir = os.environ['DATASET_DIR']
    dir = base_dir+"/npy_data"
    pattern_x = "*x*.npy"
    npy_x = sorted(glob.glob(dir + "/fp32/" + "/{}".format(pattern_x[0])))
    pattern_y = "*y*.npy"
    npy_y = sorted(glob.glob(dir + "/u8/" + "/{}".format(pattern_y[0])))

    # Create MediaPipe object
    pipe = myMediaPipe(device, queue_depth, batch_size, num_threads,
                      op_device, npy_x, npy_y)
    pipe.build()
    pipe.iter_init()

    images, lables = pipe.run()

    def as_cpu(tensor):
        if (callable(getattr(tensor, "as_cpu", None))):
            tensor = tensor.as_cpu()
        return tensor

    images = as_cpu(images).as_nparray()
    lables = as_cpu(lables).as_nparray()
    print('image shape: ', images.shape)
    print(images)
    print('lable shape: ', lables.shape)
    print(lables)
    del pipe


if __name__ == "__main__":
    dev_opdev = {'mixed': ['cpu'],
                'legacy': ['cpu']}
    for dev in dev_opdev.keys():
        for op_dev in dev_opdev[dev]:
            out = run(dev, op_dev)

The following is the output of Numpy reader using file list:

image shape:  (2, 3, 2, 3)
[[[[182. 227. 113.]
  [175. 128. 253.]]

  [[ 58. 140. 136.]
  [ 86.  80. 111.]]

  [[175. 196. 178.]
  [ 20. 163. 108.]]]


[[[186. 254.  96.]
  [180.  64. 132.]]

  [[149.  50. 117.]
  [213.   6. 111.]]

  [[ 77.  11. 160.]
  [129. 102. 154.]]]]

lable shape:  (2, 3, 2, 3)
[[[[149 187 232]
  [160 201 202]]

  [[ 80 147 153]
  [199 174 158]]

  [[200 124 139]
  [  3 161 216]]]


[[[106  93  83]
  [ 57 253  52]]

  [[222 189  26]
  [174  60 118]]

  [[218  84  43]
  [251  75  73]]]]