habana_frameworks.mediapipe.fn.CropMirrorNorm
habana_frameworks.mediapipe.fn.CropMirrorNorm¶
- Class:
habana_frameworks.mediapipe.fn.CropMirrorNorm(**kwargs)
- Define graph call:
__call__(input, mean, inv_std)
Parameter:
input - Input tensor to operator. Supported dimensions: minimum = 4, maximum = 4. Supported data types: INT8, UINT8, UINT16.
mean - Mean tensor for image normalization. Supported dimensions: minimum = 4, maximum = 4. Supported data types: FLOAT32.
inv_std - Inverse standard deviation tensor for image normalization. Supported dimensions: minimum = 4, maximum = 4. Supported data types: FLOAT32.
Description:
This operator performs fused cropping, mirroring, normalization, and type casting. It crops images with specified crop window dimensions and position. Normalization produces output using formula output = (input - mean) * inv_std. If input is a RGB24 image and normalized mean = [0.485, 0.456, 0.406], then input mean tensor = [(0.485 * 255), (0.456 * 255), (0.406 * 255)]. Similarly for a RGB24 input normalized standard deviation = [0.229, 0.224, 0.225] then inverse standard deviation tensor = [1 / (0.229 * 255), 1 / (0.224 * 255), 1 / (0.225 * 255)].
- Supported backend:
HPU
Keyword Arguments:
kwargs |
Description |
---|---|
mirror |
Flag for horizontal flip. 0 means do not perform horizontal flip. 1 means perform horizontal flip.
|
crop_w |
Specify width of crop window in pixels. crop_w should be non zero value and less than or equal to input tensor width.
|
crop_h |
Specify height of crop window in pixels. crop_h should be non zero value and less than or equal to input tensor height.
|
crop_d |
Cropping along depth axis is optional. crop_d should be set to 0 if there are no cropping along depth axis. crop_d specify depth of crop window in pixels, its default set to zero, only for volumetric data crop_d should be non zero value and less than or equal to input tensor depth.
|
crop_pos_x |
Normalized (0.0 - 1.0) position of the cropping window along width. Actual position is calculated as crop_x = crop_pos_x * (w - crop_w), where crop_pos_x is the normalized position, w is the width of the input tensor and crop_w is the width of the cropping window.
|
crop_pos_y |
Normalized (0.0 - 1.0) position of the cropping window along height. Actual position is calculated as crop_y = crop_pos_y * (h - crop_h), where crop_pos_y is the normalized position, h is the height of the input tensor and crop_h is the height of the cropping window.
|
crop_pos_z |
Only for volumetric data, normalized (0.0 - 1.0) position of the cropping window along depth. Actual position is calculated as crop_z = crop_pos_z * (d - crop_d), where crop_pos_z is the normalized position, d is the depth of the input tensor and crop_d is the depth of the cropping window.
|
dtype |
Output data type.
|
Example: CropMirrorNorm Operator
The following code snippet shows use of CropMirrorNorm operator with pre-computed normalized mean = [0.485, 0.456, 0.406] and normalized std [0.229, 0.224, 0.225]. Decoder resizes output images to 224x224, CropMirrorNorm crops them to crop_w=125, crop_h=125 size. To quantize data from FLOAT32 to UINT8, output_scale=0.03125, output_zerop=128 are used.
from habana_frameworks.mediapipe import fn
from habana_frameworks.mediapipe.mediapipe import MediaPipe
from habana_frameworks.mediapipe.media_types import imgtype as it
from habana_frameworks.mediapipe.media_types import dtype as dt
import matplotlib.pyplot as plt
import numpy as np
import os
g_display_timeout = os.getenv("DISPLAY_TIMEOUT") or 5
class myMediaPipe(MediaPipe):
def __init__(self, device, queue_depth, batch_size, num_threads, op_device, dir, img_h, img_w):
super(
myMediaPipe,
self).__init__(
device,
queue_depth,
batch_size,
num_threads,
self.__class__.__name__)
self.input = fn.ReadImageDatasetFromDir(shuffle=False,
dir=dir,
format="jpg",
device="cpu")
mean_data = np.array([(0.485 * 255), (0.456 * 255), (0.406 * 255)],
dtype=dt.FLOAT32)
std_data = np.array([1 / (0.229 * 255), 1 / (0.224 * 255), 1 / (0.225 * 255)],
dtype=dt.FLOAT32)
# Batch broadcast is true, the shape will be 4D
self.std_node = fn.MediaConst(data=std_data,
shape=[1, 1, 3],
dtype=dt.FLOAT32,
device="cpu")
self.mean_node = fn.MediaConst(data=mean_data,
shape=[1, 1, 3],
dtype=dt.FLOAT32,
device="cpu")
# WHCN
self.decode = fn.ImageDecoder(device="hpu",
output_format=it.RGB_P,
resize=[img_w, img_h])
self.cmn = fn.CropMirrorNorm(crop_w=125,
crop_h=125,
output_scale=0.03125,
output_zerop=128,
device=op_device)
# WHCN -> CWHN
self.transpose = fn.Transpose(permutation=[2, 0, 1, 3],
tensorDim=4,
dtype=dt.UINT8,
device=op_device)
def definegraph(self):
images, labels = self.input()
std = self.std_node()
mean = self.mean_node()
images = self.decode(images)
inp = self.transpose(images)
images = self.cmn(images, mean, std)
images = self.transpose(images)
return inp, images, labels
def display_images(images, batch_size, cols):
rows = (batch_size + 1) // cols
plt.figure(figsize=(10, 10))
for i in range(batch_size):
ax = plt.subplot(rows, cols, i + 1)
plt.imshow(images[i])
plt.axis("off")
plt.show(block=False)
plt.pause(g_display_timeout)
plt.close()
def run(device, op_device):
batch_size = 6
queue_depth = 2
num_threads = 1
img_width = 200
img_height = 200
base_dir = os.environ['DATASET_DIR']
dir = base_dir + "/img_data/"
columns = 3
# Create MediaPipe object
pipe = myMediaPipe(device, queue_depth, batch_size,
num_threads, op_device, dir,
img_height, img_width)
# Build MediaPipe
pipe.build()
# Initialize MediaPipe iterator
pipe.iter_init()
# Run MediaPipe
inp, images, labels = pipe.run()
def as_cpu(tensor):
if (callable(getattr(tensor, "as_cpu", None))):
tensor = tensor.as_cpu()
return tensor
# Copy data to host from device as numpy array
inp = as_cpu(inp).as_nparray()
images = as_cpu(images).as_nparray()
labels = as_cpu(labels).as_nparray()
del pipe
# Display images
display_images(images, batch_size, columns)
if __name__ == "__main__":
dev_opdev = {'mixed': ['hpu'],
'legacy': ['hpu']}
for dev in dev_opdev.keys():
for op_dev in dev_opdev[dev]:
run(dev, op_dev)
Output Images from CropMirrorNorm Operation 1
- 1
Licensed under a CC BY SA 4.0 license. The images used here are taken from https://data.caltech.edu/records/mzrjq-6wc02.