BaseTransform

class paddle.vision.transforms. BaseTransform ( keys=None ) [source]

Base class of all transforms used in computer vision.

calling logic:

if keys is None:
    _get_params -> _apply_image()
else:
    _get_params -> _apply_*() for * in keys

If you want to implement a self-defined transform method for image, rewrite _apply_* method in subclass.

Parameters

keys (list[str]|tuple[str], optional) –

Input type. Input is a tuple contains different structures, key is used to specify the type of input. For example, if your input is image type, then the key can be None or (“image”). if your input is (image, image) type, then the keys should be (“image”, “image”). if your input is (image, boxes), then the keys should be (“image”, “boxes”).

Current available strings & data type are describe below:

  • ”image”: input image, with shape of (H, W, C)

  • ”coords”: coordinates, with shape of (N, 2)

  • ”boxes”: bounding boxes, with shape of (N, 4), “xyxy” format,the 1st “xy” represents top left point of a box,the 2nd “xy” represents right bottom point.

  • ”mask”: map used for segmentation, with shape of (H, W, 1)

You can also customize your data types only if you implement the corresponding _apply_*() methods, otherwise NotImplementedError will be raised.

Examples

>>> import numpy as np
>>> from PIL import Image
>>> import paddle.vision.transforms.functional as F
>>> from paddle.vision.transforms import BaseTransform

>>> def _get_image_size(img):
...     if F._is_pil_image(img):
...         return img.size
...     elif F._is_numpy_image(img):
...         return img.shape[:2][::-1]
...     else:
...         raise TypeError("Unexpected type {}".format(type(img)))
...
>>> class CustomRandomFlip(BaseTransform):
...     def __init__(self, prob=0.5, keys=None):
...         super().__init__(keys)
...         self.prob = prob
...
...     def _get_params(self, inputs):
...         image = inputs[self.keys.index('image')]
...         params = {}
...         params['flip'] = np.random.random() < self.prob
...         params['size'] = _get_image_size(image)
...         return params
...
...     def _apply_image(self, image):
...         if self.params['flip']:
...             return F.hflip(image)
...         return image
...
...     # if you only want to transform image, do not need to rewrite this function
...     def _apply_coords(self, coords):
...         if self.params['flip']:
...             w = self.params['size'][0]
...             coords[:, 0] = w - coords[:, 0]
...         return coords
...
...     # if you only want to transform image, do not need to rewrite this function
...     def _apply_boxes(self, boxes):
...         idxs = np.array([(0, 1), (2, 1), (0, 3), (2, 3)]).flatten()
...         coords = np.asarray(boxes).reshape(-1, 4)[:, idxs].reshape(-1, 2)
...         coords = self._apply_coords(coords).reshape((-1, 4, 2))
...         minxy = coords.min(axis=1)
...         maxxy = coords.max(axis=1)
...         trans_boxes = np.concatenate((minxy, maxxy), axis=1)
...         return trans_boxes
...
...     # if you only want to transform image, do not need to rewrite this function
...     def _apply_mask(self, mask):
...         if self.params['flip']:
...             return F.hflip(mask)
...         return mask
...
>>> # create fake inputs
>>> fake_img = Image.fromarray((np.random.rand(400, 500, 3) * 255.).astype('uint8'))
>>> fake_boxes = np.array([[2, 3, 200, 300], [50, 60, 80, 100]])
>>> fake_mask = fake_img.convert('L')
>>> # only transform for image:
>>> flip_transform = CustomRandomFlip(1.0)
>>> converted_img = flip_transform(fake_img)
>>> # transform for image, boxes and mask
>>> flip_transform = CustomRandomFlip(1.0, keys=('image', 'boxes', 'mask'))
>>> (converted_img, converted_boxes, converted_mask) = flip_transform((fake_img, fake_boxes, fake_mask))
>>> converted_boxes
array([[300,   3, 498, 300],
       [420,  60, 450, 100]])