habana_frameworks.mediapipe.fn.ReadVideoDatasetFromDir¶

Class:

habana_frameworks.mediapipe.fn.ReadVideoDatasetFromDir(
    dir=/path/to/dataset/,
    format="mp4",
    seed=0,
    drop_remainder=False,
    pad_remainder=False,
    label_dtype=dt.UINT64,
    num_slices=1,
    slice_index=0,
    file_list=[],
    class_list=[],
    file_classes=[],
    frames_per_clip=1,
    clips_per_video=1,
    target_frame_rate=0,
    step_between_clips=1,
    sampler=rs.RANDOM_SAMPLER,
    fixed_clip_mode=False,
    start_frame_index=0,
    shuffle=True
)

Define graph call:

__call__()

Parameter:

None

Description:

This reader is designed for video classification tasks. There are two ways to provide input to ReadVideoDatasetFromDir:

By specifying input directory path. Sub-directory names will be considered as class_label for all the videos in it.
By providing file_list, class_list and file_classes to the reader

Reader returns batches of video path, ground truth labels and resample list. Output ground truth labels are a list of integers representing class labels of videos.

Supported backend:

Keyword Arguments:

kwargs	Description
dir	Input video directory path. Reader will read videos from all sub-directories of dir. Name of sub-directory is treated as class label for all the videos inside it. Type: str Default: None Optional: yes (either provide dir or provide file_list) Note: Arrange input files as `dir_path/class_label/video` files. For example, for training pipeline input videos are `/user/home/videos/train/<class_label>/<video_name>.mp4`, give dir= `/user/home/videos/train/`. There may be multiple sub-directories inside `/user/home/videos/train/`, one for every class_label.
format	Format or extension of video file names. It will be used to list-out all the videos in sub-directories of “dir”. Type: str Default: “mp4” Optional: no Note: Supported video file extensions are “mp4” (for H.264, HEVC codec).
seed	Seed for randomization. If not provided it will be generated internally. Type: int Default: None Optional: yes
drop_remainder	If True, reader will drop the last partial batch of the epoch. If False, padding mechanism can be controlled by `pad_reminder`. Type: bool Default: False Optional: yes
pad_remainder	If True, reader will repeat last clip in partial batch of each device. If False, reader will fill partial batch of each device with clips starting from first clip. Type: bool Default: False Optional: yes Node: pad reminder is valid when drop reminder is false
label_dtype	Required data type of output ground truth labels. Reader returns batch of video file path and ground truth labels. Output ground truth labels are list of integers, which specifies index of respective video’s class label in sorted list of all class labels. `label_dtype` specifies data type of these integers. Type: habana_frameworks.mediapipe.media_types.dtype Default: UINT64 Optional: yes
num_slices	It indicates number of cards in multi-card training. Clips are divided into `num_slices` i.e. one slice for each card. Default value is one, which indicates single card training. Type: int Default: 1 Optional: yes
slice_index	In multi-card training, it indicates index of card. Type: int Default: 0 Optional: yes (if num_slices=1, Otherwise user must provide it) Note: Default value is zero for single card training. For multi-card it must be between 0 and `num_slices -1`.
file_list	In-place of providing dir (input video directory path), user can provide list of files to reader. Type: list Default: None Optional: yes Note: `file_list` must be ordered by `class_list` i.e. list all the files of class_1 followed by all the files of class_2 and so on. Note: `file_list` should have full path of all the files. Example [“/path/to/dataset/class_1/vid_c1_0.mp4”, “/path/to/dataset/class_1/vid_c1_1.mp4”, “/path/to/dataset/class_2/vid_c2_0.mp4”, “/path/to/dataset/class_2/vid_c2_1.mp4”, …..]
class_list	List of unique class labels must be provided along with `file_list`. It will be used as a look-up table to generate output ground truth labels. Type: list Default: None Optional: yes (If `file_list` is provided, `class_list` is not optional) Note: Output ground truth labels will be index of respective video’s `class_label` in this `class_list`. This means, it will act as a look-up table to generate output ground truth labels.
file_classes	List of class name for every file in `file_list`. It will be used to generate output ground truth labels. If not provided, last sub-directory name from `file_list` will be used to generate `file_classes`. Type: list Default: None Optional: yes Note: For example `file_list` is provided but `file_classes` is not provided, file_list=[“/path/to/dataset/class_1/vid_c1_0.mp4”, “/path/to/dataset/class_1/vid_c1_1.mp4”, “/path/to/dataset/class_2/vid_c2_0.mp4”, “/path/to/dataset/class_2/vid_c2_1.mp4”, …..] In that case Reader will generate file_classes = [“class_1”, “class_1”, “class_2”, “class_2”, …] from last sub-directory name of every video in file_list.
frames_per_clip	Number of frames in a clip. Type: int Default: 1 Optional: no
clips_per_video	Number of clips to generate from each video. It determines how many clips are generated per video file. Type: int Default: 1 Optional: yes Note: This parameter is valid only if `fixed_clip_mode` is False
target_frame_rate	If specified(i.e. not 0), resample the videos so that they have the same `target_frame_rate`. If 0, resampling based on frame rate is not done. Type: int Default: 0 Optional: yes Note: This parameter is valid only if `fixed_clip_mode` is False
step_between_clip	Step size between clips in terms of frames. It defines how many frames to skip between consecutive clips. Type: int Default: 1 Optional: yes Note: This parameter is valid only if `fixed_clip_mode` is False
sampler	Type of sampler to use for selecting clips. It determines the strategy for sampling clips from the video. Type: habana_frameworks.mediapipe.media_types Default: `RANDOM_SAMPLER` Optional: yes Note: Types of Sampler supported are `RANDOM_SAMPLER` and `UNIFORM_SAMPLER`. This parameter is valid only if `fixed_clip_mode` is False `RANDOM_SAMPLER` : Sample (at most) `clips_per_video` clips for each video randomly. `UNIFORM_SAMPLER`: Sample `clips_per_video` clips for each video, equally spaced.
fixed_clip_mode	If True, clips are extracted starting from a fixed frame index (`start_frame_index`). If False, sampler is used to generate clips. Type: bool Default: False Optional: yes
start_frame_index	Each video is used to generate a clip from `start_frame_index` of length `frames_per_clip`. Type: int Default: 0 Optional: yes Note: This parameter is valid only if `fixed_clip_mode` is True
shuffle	If set to True, the reader shuffles the entire dataset before each epoch. In case of multi-card training, it first creates random slices/chunk of files for every card then shuffle. Type: bool Default: True Optional: yes Note: This parameter is valid only if `fixed_clip_mode` is True

Example #1: Use ReadVideoDatasetFromDir by Providing Input Directory¶

The following code snippet shows use of ReadVideoDatasetFromDir by providing input video directory path. Input mp4 videos are present in sub directories of “//path/to/dataset/”. For example:

“/path/to/dataset/class_1/vid_c1_0.mp4”
“/path/to/dataset/class_1/vid_c1_1.mp4”
“/path/to/dataset/class_2/vid_c2_0.mp4”
“/path/to/dataset/class_3/vid_c3_1.mp4”

fn.ReadVideoDatasetFromDir(shuffle=False, dir="/path/to/dataset/", format="mp4")

Since the format="mp4", the reader will process all “mp4” files in the sub-directories of dir. Names of sub-directories are considered as class_label for all the videos in it. Reader internally creates class_list which is a sorted list of unique class_labels (i.e. sorted list of unique sub-directory names). class_list is used as a dictionary to generate output ground truth labels. Output ground truth label of every video is index of video class_label in class_list (i.e. index of sub-directory name in which that video is present in class_list). In the below example, reader is returning ground truth label for every video, which is displayed as align title of that Video.

from habana_frameworks.mediapipe import fn
from habana_frameworks.mediapipe.mediapipe import MediaPipe
from habana_frameworks.mediapipe.media_types import imgtype as it
from habana_frameworks.mediapipe.media_types import dtype as dt
import matplotlib.pyplot as plt
import os


class myMediaPipe(MediaPipe):
    def __init__(self,
                device,
                queue_depth,
                batch_size,
                num_threads,
                channel,
                dir,
                width,
                height,
                frame_per_clip):
        super(myMediaPipe, self).__init__(device,
                                          queue_depth,
                                          batch_size,
                                          num_threads,
                                          self.__class__.__name__)
        self.input = fn.ReadVideoDatasetFromDir(shuffle=False,
                                                dir=dir,
                                                format="mp4",
                                                frames_per_clip=frame_per_clip,
                                                fixed_clip_mode=True)

        self.decode = fn.VideoDecoder(device="hpu",
                                      output_format=it.RGB_I,
                                      resize=[width, height],
                                      frames_per_clip=frame_per_clip,
                                      max_frame_vid=frame_per_clip)

    def definegraph(self):
        videos, labels, resample = self.input()
        videos = self.decode(videos, resample)
        return videos, labels


def display_videos(videos, labels, batch_size, frame_per_clip, cols):
    rows = (batch_size * frame_per_clip) // cols
    plt.figure(figsize=(10, 10))
    frame_index = 0
    for i in range(batch_size):
        frm_num = 0
        for j in range(frame_per_clip):
            frm_num += 1
            ax = plt.subplot(rows, cols, frame_index + 1)
            plt.imshow(videos[i][j])
            plt.title("Label: " + str(labels[i]) + " Frame: " + str(frm_num))
            plt.axis("off")
            frame_index += 1
    plt.show()


def main():
    batch_size = 4
    img_width = 200
    img_height = 200
    frame_per_clip = 2
    channels = 3
    queue_depth = 3
    num_threads = 1
    base_dir = os.environ['DATASET_DIR']
    dir = base_dir + "/vid_data/"

    pipe = myMediaPipe('legacy', queue_depth, batch_size, num_threads,
                      channels, dir, img_width, img_height, frame_per_clip)
    pipe.build()
    pipe.iter_init()
    bcnt = 0
    while (bcnt < 1):
        try:
            videos, labels = pipe.run()
        except StopIteration:
            break
        videos = videos.as_cpu().as_nparray()
        labels = labels.as_cpu().as_nparray()

        display_videos(videos, labels, batch_size, frame_per_clip, cols=4)
        bcnt = bcnt + 1


if __name__ == "__main__":
    main()

Example #1: Output Videos 1

Displaying 2 decoded frames of each video in a batch. Batch size here is 4.

label 0, frame 1

label 0, frame 2

label 0, frame 1

label 0, frame 2

label 1, frame 1

label 1, frame 2

label 2, frame 1

label 2, frame 2

1: Licensed under a CC BY SA 4.0 license. The videos used here are generated using images from https://data.caltech.edu/records/mzrjq-6wc02.

Example #2: Use ReadVideoDatasetFromDir by Providing file_list¶

The following code snippet shows an alternative method of using ReadVideoDatasetFromDir by providing file_list and class_list:

file_list - Specifies list of files to process.
class_list - Must be provided along with file_list. It is list of unique class labels.
file_classes - list of class labels for every video.

Output ground truth labels will be index of video’s class label in class_list, which is shown as title of output videos.

import matplotlib.pyplot as plt
from habana_frameworks.mediapipe import fn
from habana_frameworks.mediapipe.mediapipe import MediaPipe
from habana_frameworks.mediapipe.media_types import imgtype as it
from habana_frameworks.mediapipe.media_types import dtype as dt
import os


class myMediaPipe(MediaPipe):
    def __init__(self,
                device,
                queue_depth,
                batch_size,
                num_threads,
                channel,
                width,
                height,
                frame_per_clip):
        super(myMediaPipe, self).__init__(device,
                                          queue_depth,
                                          batch_size,
                                          num_threads,
                                          self.__class__.__name__)

        base_dir = os.environ['DATASET_DIR']
        dir = base_dir + "/vid_data/"

        file_list = [dir + "/butterfly/vid_01.mp4",
                    dir + "/butterfly/vid_02.mp4",
                    dir + "/dolphin/vid_03.mp4",
                    dir + "/sunflower/vid_04.mp4",
                    ]

        class_list = ["butterfly", "dolphin", "sunflower"]
        file_classes = ["butterfly", "butterfly", "dolphin", "sunflower"]

        self.input = fn.ReadVideoDatasetFromDir(shuffle=False,
                                                file_list=file_list,
                                                class_list=class_list,
                                                file_classes=file_classes,
                                                format="mp4",
                                                frames_per_clip=frame_per_clip,
                                                fixed_clip_mode=True)

        self.decode = fn.VideoDecoder(device="hpu",
                                      output_format=it.RGB_I,
                                      resize=[height, width],
                                      frames_per_clip=frame_per_clip,
                                      max_frame_vid=frame_per_clip)

    def definegraph(self):
        videos, labels, resample = self.input()
        videos = self.decode(videos, resample)
        return videos, labels


def display_videos(videos, labels, batch_size, frame_per_clip, cols):
    rows = (batch_size * frame_per_clip) // cols
    plt.figure(figsize=(10, 10))
    frame_index = 0
    for i in range(batch_size):
        frm_num = 0
        for j in range(frame_per_clip):
            frm_num += 1
            ax = plt.subplot(rows, cols, frame_index + 1)
            plt.imshow(videos[i][j])
            plt.title("Label: " + str(labels[i]) + " Frame: " + str(frm_num))
            plt.axis("off")
            frame_index += 1
    plt.show()


def main():
    batch_size = 4
    img_width = 200
    img_height = 200
    channels = 3
    queue_depth = 3
    num_threads = 1
    frames_per_clip = 2
    pipe = myMediaPipe('legacy', queue_depth, batch_size, num_threads,
                      channels, img_width, img_height, frames_per_clip)
    pipe.build()
    pipe.iter_init()
    bcnt = 0
    while (bcnt < 1):
        try:
            videos, labels = pipe.run()
        except StopIteration:
            break
        videos = videos.as_cpu().as_nparray()
        labels = labels.as_cpu().as_nparray()

        display_videos(videos, labels, batch_size, frames_per_clip, cols=4)
        bcnt = bcnt + 1


if __name__ == "__main__":
    main()