habana_frameworks.mediapipe.fn.ReadVideoDatasetFromDirGen
On this Page
habana_frameworks.mediapipe.fn.ReadVideoDatasetFromDirGen¶
Class:
habana_frameworks.mediapipe.fn.ReadVideoDatasetFromDirGen( dir=/path/to/dataset/, format="mp4", seed=0, label_dtype=dt.UINT32, num_slices=1, slice_index=0, file_list=[], class_list=[], file_classes=[], frames_per_clip=1, stride=1, clips_per_video=1, step_between_clips=1, start_frame_index=0, sampler=cs.CONTIGUOUS_SAMPLER, last_batch_strategy=lbs.CYCLIC, slice_once=True, is_modulo_slice=True )
- Define graph call:
__call__()
- Parameter:
None
Description:
This reader is designed for video classification tasks. There are two ways to provide input to ReadVideoDatasetFromDirGen:
By specifying the input directory path, the names of the subdirectories will be considered as class labels (
class_label
) for all the videos included.By providing
file_list
,class_list
andfile_classes
to the reader
The reader returns batches of video paths, ground truth labels, and a resample list. The output ground truth labels are a list of integers representing the class labels of the videos.
Supported backend:
CPU
Keyword Arguments:
kwargs |
Description |
---|---|
dir |
Input video directory path. Reads videos from all subdirectories within the specified directory. The name of each subdirectory is treated as the
|
format |
Format or extension of video file names. Lists all the videos in the subdirectories of “dir”.
|
seed |
Seed for randomization. If not provided, it will be generated internally.
|
label_dtype |
Required data type of output ground truth labels. Reader returns batch of video file path and ground truth labels. Output ground truth labels are list of integers, which specifies index of respective video’s class label in sorted list of all class labels.
|
num_slices |
Indicates number of cards in multi-card training. Clips are divided into
|
slice_index |
In multi-card training, it indicates index of card.
|
file_list |
In-place of providing dir (input video directory path), user can provide list of files to reader.
|
class_list |
List of unique class labels must be provided along with
|
file_classes |
List of class name for every file in
|
frames_per_clip |
Number of frames in a clip.
|
stride |
Number of frames between consecutive frames of clip.
|
clips_per_video |
Number of clips to generate from each video. It determines how many clips are generated per video file.
|
step_between_clip |
Step size between clips in terms of frames. It defines how many frames to skip between consecutive clips.
|
start_frame_index |
Each video is used to generate clips starting from frame index
|
sampler |
Type of sampler to use for selecting clips. It determines the strategy for sampling clips from the video.
|
last_batch_strategy |
Strategy for handling partial batch.
|
slice_once |
Whether reader of each device should slice clips once (True) or slice in each epoch (False).
|
is_modulo_slice |
Whether the reader should slice clips in modulo pattern (True) or slice wise (False).
|
Example #1: Use ReadVideoDatasetFromDirGen by Providing Input Directory¶
The following code snippet shows use of ReadVideoDatasetFromDirGen by providing input video directory path. Input mp4 videos are present in sub directories of “//path/to/dataset/”. For example:
“/path/to/dataset/class_1/vid_c1_0.mp4”
“/path/to/dataset/class_1/vid_c1_1.mp4”
“/path/to/dataset/class_2/vid_c2_0.mp4”
“/path/to/dataset/class_3/vid_c3_1.mp4”
fn.ReadVideoDatasetFromDirGen(dir="/path/to/dataset/", format="mp4")
Since the format="mp4"
, the reader will process all “mp4” files in the sub-directories of dir
.
Names of sub-directories are considered as class_label
for all the videos in it.
Reader internally creates class_list
which is a sorted list of unique class_labels
(i.e. sorted list of unique sub-directory names).
class_list
is used as a dictionary to generate output ground truth labels. Output ground truth label of every video is index
of video class_label
in class_list
(i.e. index of sub-directory name in which that video is present in class_list
).
In the below example, reader is returning ground truth label for every video, which is displayed as align title of that Video.
import os
import matplotlib.pyplot as plt
from habana_frameworks.mediapipe import fn
from habana_frameworks.mediapipe.mediapipe import MediaPipe
from habana_frameworks.mediapipe.media_types import imgtype as it
from habana_frameworks.mediapipe.media_types import dtype as dt
from habana_frameworks.mediapipe.media_types import clipSampler as cs
from habana_frameworks.mediapipe.media_types import lastBatchStrategy as lbs
g_stride = 3
g_step_between_clips = 2
g_clips_per_video = 2
def get_dec_max_frame_gen(frame_per_clip, stride):
dec_max_frame = ((frame_per_clip - 1) * stride) + 1
return dec_max_frame
class myMediaPipe(MediaPipe):
def __init__(self,
device,
queue_depth,
batch_size,
num_threads,
dir,
resize_w,
resize_h,
frame_per_clip):
super(myMediaPipe, self).__init__(device,
queue_depth,
batch_size,
num_threads,
self.__class__.__name__)
self.input = fn.ReadVideoDatasetFromDirGen(dir=dir,
format="mp4",
frames_per_clip=frame_per_clip,
clips_per_video=g_clips_per_video,
stride=g_stride,
step_between_clips=g_step_between_clips,
last_batch_strategy=lbs.CYCLIC,
sampler=cs.CONTIGUOUS_SAMPLER)
dec_max_frame = get_dec_max_frame_gen(frame_per_clip, g_stride)
print("VideoDecoder max_frame_vid: {} resize: w {} h {}".format(dec_max_frame,
resize_w,
resize_h))
self.decode = fn.VideoDecoder(device="hpu",
output_format=it.RGB_I,
resize=[resize_w, resize_h],
frames_per_clip=frame_per_clip,
max_frame_vid=dec_max_frame)
def definegraph(self):
videos, labels, resample = self.input()
videos = self.decode(videos, resample)
return videos, labels
def display_videos(videos, labels, batch_size, frame_per_clip, cols):
rows = (batch_size * frame_per_clip) // cols
plt.figure(figsize=(10, 10))
frame_index = 0
for i in range(batch_size):
frm_num = 0
for j in range(frame_per_clip):
frm_num += 1
ax = plt.subplot(rows, cols, frame_index + 1)
plt.imshow(videos[i][j])
plt.title("Label: " + str(labels[i]) + " Frame: " + str(frm_num))
plt.axis("off")
frame_index += 1
plt.show()
def main():
batch_size = 4
img_width = 200
img_height = 200
queue_depth = 3
frame_per_clip = 2
num_threads = 1
base_dir = os.environ['DATASET_DIR']
dir = base_dir + "/vid_data/"
pipe = myMediaPipe("legacy",
queue_depth,
batch_size,
num_threads,
dir,
img_width,
img_height,
frame_per_clip)
pipe.build()
pipe.iter_init()
bcnt = 0
while(bcnt < 2):
try:
videos, labels = pipe.run()
except StopIteration:
break
videos = videos.as_cpu().as_nparray()
labels = labels.as_cpu().as_nparray()
display_videos(videos, labels, batch_size, frame_per_clip, cols=4)
bcnt = bcnt + 1
if __name__ == "__main__":
main()
Example #1: Output Videos 1
Displaying 2 decoded frames of each video in a batch. Batch size here is 4.
- 1
Licensed under a CC BY SA 4.0 license. The videos used here are generated using images from https://data.caltech.edu/records/mzrjq-6wc02.