habana_frameworks.mediapipe.fn.ReadVideoDatasetFromDirGen

Class:

habana_frameworks.mediapipe.fn.ReadVideoDatasetFromDirGen(
    dir=/path/to/dataset/,
    format="mp4",
    seed=0,
    label_dtype=dt.UINT32,
    num_slices=1,
    slice_index=0,
    file_list=[],
    class_list=[],
    file_classes=[],
    frames_per_clip=1,
    stride=1,
    clips_per_video=1,
    step_between_clips=1,
    start_frame_index=0,
    sampler=cs.CONTIGUOUS_SAMPLER,
    last_batch_strategy=lbs.CYCLIC,
    slice_once=True,
    is_modulo_slice=True
)
Define graph call:
  • __call__()

Parameter:
  • None

Description:

This reader is designed for video classification tasks. There are two ways to provide input to ReadVideoDatasetFromDirGen:

  • By specifying the input directory path, the names of the subdirectories will be considered as class labels (class_label) for all the videos included.

  • By providing file_list, class_list and file_classes to the reader

The reader returns batches of video paths, ground truth labels, and a resample list. The output ground truth labels are a list of integers representing the class labels of the videos.

Supported backend:

  • CPU

Keyword Arguments:

kwargs

Description

dir

Input video directory path. Reads videos from all subdirectories within the specified directory. The name of each subdirectory is treated as the class_label for all the videos included.

  • Type: str

  • Default: None

  • Optional: yes (either provide dir or provide file_list)

  • Note: Arrange input files as dir_path/class_label/video files. For example, for training pipeline input, the videos are /user/home/videos/train/<class_label>/<video_name>.mp4, give dir= /user/home/videos/train/. There might be multiple subdirectories in /user/home/videos/train/, one for each class_label.

format

Format or extension of video file names. Lists all the videos in the subdirectories of “dir”.

  • Type: str

  • Default: “mp4”

  • Optional: no

  • Note: The supported video file extensions are “mp4”, “h264”, “hevc”.

seed

Seed for randomization. If not provided, it will be generated internally.

  • Type: int

  • Default: None

  • Optional: yes

label_dtype

Required data type of output ground truth labels. Reader returns batch of video file path and ground truth labels. Output ground truth labels are list of integers, which specifies index of respective video’s class label in sorted list of all class labels. label_dtype specifies data type of these integers.

  • Type: habana_frameworks.mediapipe.media_types.dtype

  • Default: UINT32

  • Optional: yes

num_slices

Indicates number of cards in multi-card training. Clips are divided into num_slices i.e. one slice for each card. Default value is one, which indicates single card training.

  • Type: int

  • Default: 1

  • Optional: yes

slice_index

In multi-card training, it indicates index of card.

  • Type: int

  • Default: 0

  • Optional: yes (if num_slices=1, Otherwise user must provide it)

  • Note: Default value is zero for single card training. For multi-card it must be between 0 and num_slices -1.

file_list

In-place of providing dir (input video directory path), user can provide list of files to reader.

  • Type: list

  • Default: None

  • Optional: yes

  • Note: file_list must be ordered by class_list i.e. list all the files of class_1 followed by all the files of class_2 and so on.

  • Note: file_list should have full path of all the files. Example [“/path/to/dataset/class_1/vid_c1_0.mp4”, “/path/to/dataset/class_1/vid_c1_1.mp4”, “/path/to/dataset/class_2/vid_c2_0.mp4”, “/path/to/dataset/class_2/vid_c2_1.mp4”, …..]

class_list

List of unique class labels must be provided along with file_list. It will be used as a look-up table to generate output ground truth labels.

  • Type: list

  • Default: None

  • Optional: yes (If file_list is provided, class_list is not optional)

  • Note: Output ground truth labels will be index of respective video’s class_label in this class_list. This means, it will act as a look-up table to generate output ground truth labels.

file_classes

List of class name for every file in file_list. It will be used to generate output ground truth labels. If not provided, last sub-directory name from file_list will be used to generate file_classes.

  • Type: list

  • Default: None

  • Optional: yes

  • Note: For example file_list is provided but file_classes is not provided, file_list=[“/path/to/dataset/class_1/vid_c1_0.mp4”, “/path/to/dataset/class_1/vid_c1_1.mp4”, “/path/to/dataset/class_2/vid_c2_0.mp4”, “/path/to/dataset/class_2/vid_c2_1.mp4”, …..]

    In that case Reader will generate file_classes = [“class_1”, “class_1”, “class_2”, “class_2”, …] from last sub-directory name of every video in file_list.

frames_per_clip

Number of frames in a clip.

  • Type: int

  • Default: 1

  • Optional: no

stride

Number of frames between consecutive frames of clip.

  • Type: int

  • Default: 1

  • Optional: yes

clips_per_video

Number of clips to generate from each video. It determines how many clips are generated per video file.

  • Type: int

  • Default: 1

  • Optional: yes

step_between_clip

Step size between clips in terms of frames. It defines how many frames to skip between consecutive clips.

  • Type: int

  • Default: 1

  • Optional: yes

start_frame_index

Each video is used to generate clips starting from frame index start_frame_index.

  • Type: int

  • Default: 0

  • Optional: yes

sampler

Type of sampler to use for selecting clips. It determines the strategy for sampling clips from the video.

  • Type: habana_frameworks.mediapipe.media_types

  • Default: CONTIGUOUS_SAMPLER

  • Optional: yes

  • Note: Types of Sampler supported are RANDOM_SAMPLER, UNIFORM_SAMPLER, CONTIGUOUS_SAMPLER, CONTIGUOUS_RANDOM_SAMPLER.

  • RANDOM_SAMPLER : Sample (at most) clips_per_video clips for each video randomly.

  • UNIFORM_SAMPLER : Sample clips_per_video clips for each video, equally spaced.

  • CONTIGUOUS_SAMPLER : Sample (at most) consecutive clips_per_video clips for each video.

  • CONTIGUOUS_RANDOM_SAMPLER: Sample (at most) consecutive clips_per_video clips for each video, and then shuffle the clips.

last_batch_strategy

Strategy for handling partial batch.

  • Type: habana_frameworks.mediapipe.media_types.lastBatchStrategy

  • Default: CYCLIC

  • Optional: yes

  • Note: Type of last batch strategy supported are DROP, PAD, CYCLIC.

  • DROP : Drop the clips from partial batch

  • PAD : Repeat the last clip in partial batch

  • CYCLIC: Repeat the clips starting from first clip in partial batch

slice_once

Whether reader of each device should slice clips once (True) or slice in each epoch (False).

  • Type: bool

  • Default: True

  • Optional: yes

is_modulo_slice

Whether the reader should slice clips in modulo pattern (True) or slice wise (False).

  • Type: bool

  • Default: True

  • Optional: yes

Example #1: Use ReadVideoDatasetFromDirGen by Providing Input Directory

The following code snippet shows use of ReadVideoDatasetFromDirGen by providing input video directory path. Input mp4 videos are present in sub directories of “//path/to/dataset/”. For example:

  • “/path/to/dataset/class_1/vid_c1_0.mp4”

  • “/path/to/dataset/class_1/vid_c1_1.mp4”

  • “/path/to/dataset/class_2/vid_c2_0.mp4”

  • “/path/to/dataset/class_3/vid_c3_1.mp4”

fn.ReadVideoDatasetFromDirGen(dir="/path/to/dataset/", format="mp4")

Since the format="mp4", the reader will process all “mp4” files in the sub-directories of dir. Names of sub-directories are considered as class_label for all the videos in it. Reader internally creates class_list which is a sorted list of unique class_labels (i.e. sorted list of unique sub-directory names). class_list is used as a dictionary to generate output ground truth labels. Output ground truth label of every video is index of video class_label in class_list (i.e. index of sub-directory name in which that video is present in class_list). In the below example, reader is returning ground truth label for every video, which is displayed as align title of that Video.

import os
import matplotlib.pyplot as plt
from habana_frameworks.mediapipe import fn
from habana_frameworks.mediapipe.mediapipe import MediaPipe
from habana_frameworks.mediapipe.media_types import imgtype as it
from habana_frameworks.mediapipe.media_types import dtype as dt
from habana_frameworks.mediapipe.media_types import clipSampler as cs
from habana_frameworks.mediapipe.media_types import lastBatchStrategy as lbs

g_stride = 3
g_step_between_clips = 2
g_clips_per_video = 2


def get_dec_max_frame_gen(frame_per_clip, stride):
    dec_max_frame = ((frame_per_clip - 1) * stride) + 1
    return dec_max_frame


class myMediaPipe(MediaPipe):

    def __init__(self,
                device,
                queue_depth,
                batch_size,
                num_threads,
                dir,
                resize_w,
                resize_h,
                frame_per_clip):

        super(myMediaPipe, self).__init__(device,
                                          queue_depth,
                                          batch_size,
                                          num_threads,
                                          self.__class__.__name__)

        self.input = fn.ReadVideoDatasetFromDirGen(dir=dir,
                                                  format="mp4",
                                                  frames_per_clip=frame_per_clip,
                                                  clips_per_video=g_clips_per_video,
                                                  stride=g_stride,
                                                  step_between_clips=g_step_between_clips,
                                                  last_batch_strategy=lbs.CYCLIC,
                                                  sampler=cs.CONTIGUOUS_SAMPLER)

        dec_max_frame = get_dec_max_frame_gen(frame_per_clip, g_stride)
        print("VideoDecoder max_frame_vid: {} resize: w {} h {}".format(dec_max_frame,
                                                                        resize_w,
                                                                        resize_h))

        self.decode = fn.VideoDecoder(device="hpu",
                                      output_format=it.RGB_I,
                                      resize=[resize_w, resize_h],
                                      frames_per_clip=frame_per_clip,
                                      max_frame_vid=dec_max_frame)

    def definegraph(self):
        videos, labels, resample = self.input()
        videos = self.decode(videos, resample)
        return videos, labels


def display_videos(videos, labels, batch_size, frame_per_clip, cols):
    rows = (batch_size * frame_per_clip) // cols
    plt.figure(figsize=(10, 10))
    frame_index = 0
    for i in range(batch_size):
        frm_num = 0
        for j in range(frame_per_clip):
            frm_num += 1
            ax = plt.subplot(rows, cols, frame_index + 1)
            plt.imshow(videos[i][j])
            plt.title("Label: " + str(labels[i]) + " Frame: " + str(frm_num))
            plt.axis("off")
            frame_index += 1
    plt.show()


def main():
    batch_size = 4
    img_width = 200
    img_height = 200
    queue_depth = 3
    frame_per_clip = 2
    num_threads = 1
    base_dir = os.environ['DATASET_DIR']
    dir = base_dir + "/vid_data/"

    pipe = myMediaPipe("legacy",
                      queue_depth,
                      batch_size,
                      num_threads,
                      dir,
                      img_width,
                      img_height,
                      frame_per_clip)
    pipe.build()
    pipe.iter_init()

    bcnt = 0
    while(bcnt < 2):
        try:
            videos, labels = pipe.run()
        except StopIteration:
            break
        videos = videos.as_cpu().as_nparray()
        labels = labels.as_cpu().as_nparray()

        display_videos(videos, labels, batch_size, frame_per_clip, cols=4)
        bcnt = bcnt + 1


if __name__ == "__main__":
    main()

Example #1: Output Videos 1

Displaying 2 decoded frames of each video in a batch. Batch size here is 4.

label 0, frame 1 label 0, frame 1
label 0, frame 2 label 0, frame 2
label 0, frame 1 label 0, frame 1
label 0, frame 2 label 0, frame 2
label 0, frame 1 label 0, frame 1
label 0, frame 2 label 0, frame 2
label 0, frame 1 label 0, frame 1
label 0, frame 2 label 0, frame 2
label 1, frame 1 label 1, frame 1
label 1, frame 2 label 1, frame 2
label 1, frame 1 label 1, frame 1
label 1, frame 2 label 1, frame 2
label 2, frame 1 label 2, frame 1
label 2, frame 2 label 2, frame 2
label 2, frame 1 label 2, frame 1
label 2, frame 2 label 2, frame 2
1

Licensed under a CC BY SA 4.0 license. The videos used here are generated using images from https://data.caltech.edu/records/mzrjq-6wc02.