Shortcuts

mmedit.datasets.transforms

Package Contents

Classes

GenerateSeg

Generate segmentation mask from alpha matte.

GenerateSoftSeg

Generate soft segmentation mask from input segmentation mask.

MirrorSequence

Extend short sequences (e.g. Vimeo-90K) by mirroring the sequences.

TemporalReverse

Reverse frame lists for temporal augmentation.

BinarizeImage

Binarize image.

Clip

Clip the pixels.

ColorJitter

An interface for torch color jitter so that it can be invoked in

RandomAffine

Apply random affine to input images.

RandomMaskDilation

Randomly dilate binary masks.

UnsharpMasking

Apply unsharp masking to an image or a sequence of images.

Flip

Flip the input data with a probability.

NumpyPad

Numpy Padding.

RandomRotation

Rotate the image by a randomly-chosen angle, measured in degree.

RandomTransposeHW

Randomly transpose images in H and W dimensions with a probability.

Resize

Resize data to a specific size for training or resize the images to fit

CenterCropLongEdge

Center crop the given image by the long edge.

Crop

Crop data to specific size for training.

CropAroundCenter

Randomly crop the images around unknown area in the center 1/4 images.

CropAroundFg

Crop around the whole foreground in the segmentation mask.

CropAroundUnknown

Crop around unknown area with a randomly selected scale.

CropLike

Crop/pad the image in the target_key according to the size of image in

FixedCrop

Crop paired data (at a specific position) to specific size for training.

InstanceCrop

Use maskrcnn to detect instances on image.

ModCrop

Mod crop images, used during testing.

PairedRandomCrop

Paried random crop.

RandomCropLongEdge

Random crop the given image by the long edge.

RandomResizedCrop

Crop data to random size and aspect ratio.

CompositeFg

Composite foreground with a random foreground.

MergeFgAndBg

Composite foreground image and background image with alpha.

PerturbBg

Randomly add gaussian noise or gamma change to background image.

RandomJitter

Randomly jitter the foreground in hsv space.

RandomLoadResizeBg

Randomly load a background image and resize it.

PackEditInputs

Pack the inputs data for SR, VFI, matting and inpainting.

ToTensor

Convert some values in results dict to torch.Tensor type in data

GenerateCoordinateAndCell

Generate coordinate and cell. Generate coordinate from the desired size

GenerateFacialHeatmap

Generate heatmap from keypoint.

GenerateFrameIndices

Generate frame index for REDS datasets. It also performs temporal

GenerateFrameIndiceswithPadding

Generate frame index with padding for REDS dataset and Vid4 dataset

GenerateSegmentIndices

Generate frame indices for a segment. It also performs temporal

GetMaskedImage

Get masked image.

GetSpatialDiscountMask

Get spatial discounting mask constant.

LoadImageFromFile

Load a single image or image frames from corresponding paths. Required

LoadMask

Load Mask for multiple types.

LoadPairedImageFromFile

Load a pair of images from file.

MATLABLikeResize

Resize the input image using MATLAB-like downsampling.

Normalize

Normalize images with the given mean and std value.

RescaleToZeroOne

Transform the images into a range between 0 and 1.

DegradationsWithShuffle

Apply random degradations to input, with degradations being shuffled.

RandomBlur

Apply random blur to the input.

RandomJPEGCompression

Apply random JPEG compression to the input.

RandomNoise

Apply random noise to the input.

RandomResize

Randomly resize the input.

RandomVideoCompression

Apply random video compression to the input.

RandomDownSampling

Generate LQ image from GT (and crop), which will randomly pick a scale.

FormatTrimap

Convert trimap (tensor) to one-hot representation.

GenerateTrimap

Using random erode/dilate to generate trimap from alpha matte.

GenerateTrimapWithDistTransform

Generate trimap with distance transform function.

TransformTrimap

Transform trimap into two-channel and six-channel.

CopyValues

Copy the value of source keys to destination keys.

SetValues

Set value to destination keys.

class mmedit.datasets.transforms.GenerateSeg(kernel_size=5, erode_iter_range=(10, 20), dilate_iter_range=(15, 30), num_holes_range=(0, 3), hole_sizes=[(15, 15), (25, 25), (35, 35), (45, 45)], blur_ksizes=[(21, 21), (31, 31), (41, 41)])[source]

Bases: mmcv.transforms.BaseTransform

Generate segmentation mask from alpha matte.

Parameters
  • kernel_size (int, optional) – Kernel size for both erosion and dilation. The kernel will have the same height and width. Defaults to 5.

  • erode_iter_range (tuple, optional) – Iteration of erosion. Defaults to (10, 20).

  • dilate_iter_range (tuple, optional) – Iteration of dilation. Defaults to (15, 30).

  • num_holes_range (tuple, optional) – Range of number of holes to randomly select from. Defaults to (0, 3).

  • hole_sizes (list, optional) – List of (h, w) to be selected as the size of the rectangle hole. Defaults to [(15, 15), (25, 25), (35, 35), (45, 45)].

  • blur_ksizes (list, optional) – List of (h, w) to be selected as the kernel_size of the gaussian blur. Defaults to [(21, 21), (31, 31), (41, 41)].

static _crop_hole(img, start_point, hole_size)

Create a all-zero rectangle hole in the image.

Parameters
  • img (np.ndarray) – Source image.

  • start_point (tuple[int]) – The top-left point of the rectangle.

  • hole_size (tuple[int]) – The height and width of the rectangle hole.

Returns

The cropped image.

Return type

np.ndarray

transform(results: dict) dict

Transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.GenerateSoftSeg(fg_thr=0.2, border_width=25, erode_ksize=3, dilate_ksize=5, erode_iter_range=(10, 20), dilate_iter_range=(3, 7), blur_ksizes=[(21, 21), (31, 31), (41, 41)])[source]

Bases: mmcv.transforms.BaseTransform

Generate soft segmentation mask from input segmentation mask.

Required key is “seg”, added key is “soft_seg”.

Parameters
  • fg_thr (float, optional) – Threshold of the foreground in the normalized input segmentation mask. Defaults to 0.2.

  • border_width (int, optional) – Width of border to be padded to the bottom of the mask. Defaults to 25.

  • erode_ksize (int, optional) – Fixed kernel size of the erosion. Defaults to 5.

  • dilate_ksize (int, optional) – Fixed kernel size of the dilation. Defaults to 5.

  • erode_iter_range (tuple, optional) – Iteration of erosion. Defaults to (10, 20).

  • dilate_iter_range (tuple, optional) – Iteration of dilation. Defaults to (3, 7).

  • blur_ksizes (list, optional) – List of (h, w) to be selected as the kernel_size of the gaussian blur. Defaults to [(21, 21), (31, 31), (41, 41)].

transform(results: dict) dict

Transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.MirrorSequence(keys)[source]

Bases: mmcv.transforms.BaseTransform

Extend short sequences (e.g. Vimeo-90K) by mirroring the sequences.

Given a sequence with N frames (x1, …, xN), extend the sequence to (x1, …, xN, xN, …, x1).

Required Keys:

  • [KEYS]

Modified Keys:

  • [KEYS]

Parameters

keys (list[str]) – The frame lists to be extended.

transform(results)

transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.TemporalReverse(keys, reverse_ratio=0.5)[source]

Bases: mmcv.transforms.BaseTransform

Reverse frame lists for temporal augmentation.

Required keys are the keys in attributes “lq” and “gt”, added or modified keys are “lq”, “gt” and “reverse”.

Parameters
  • keys (list[str]) – The frame lists to be reversed.

  • reverse_ratio (float) – The probability to reverse the frame lists. Default: 0.5.

transform(results)

transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.BinarizeImage(keys, binary_thr, a_min=0, a_max=1, dtype=np.uint8)[source]

Bases: mmcv.transforms.BaseTransform

Binarize image.

Parameters
  • keys (Sequence[str]) – The images to be binarized.

  • binary_thr (float) – Threshold for binarization.

  • amin (int) – Lower limits of pixel value.

  • amx (int) – Upper limits of pixel value.

  • dtype (np.dtype) – Set the data type of the output. Default: np.uint8

_binarize(img)

Binarize image.

Parameters

img (np.ndarray) – Input image.

Returns

Output image.

Return type

img (np.ndarray)

transform(results)

The transform function of BinarizeImage.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.Clip(keys, a_min=0, a_max=255)[source]

Bases: mmcv.transforms.BaseTransform

Clip the pixels.

Modified keys are the attributes specified in “keys”.

Parameters
  • keys (list[str]) – The keys whose values are clipped.

  • amin (int) – Lower limits of pixel value.

  • amx (int) – Upper limits of pixel value.

_clip(input_)
transform(results)

transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict with the values of the specified keys are rounded

and clipped.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.ColorJitter(keys, channel_order='rgb', **kwargs)[source]

Bases: mmcv.transforms.BaseTransform

An interface for torch color jitter so that it can be invoked in mmediting pipeline.

Randomly change the brightness, contrast and saturation of an image. Modified keys are the attributes specified in “keys”.

Required Keys:

  • [KEYS]

Modified Keys:

  • [KEYS]

Parameters
  • keys (list[str]) – The images to be resized.

  • channel_order (str) – Order of channel, candidates are ‘bgr’ and ‘rgb’. Default: ‘rgb’.

Notes

**kwards follows the args list of torchvision.transforms.ColorJitter.

brightness (float or tuple of float (min, max)): How much to jitter

brightness. brightness_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness] or the given [min, max]. Should be non negative numbers.

contrast (float or tuple of float (min, max)): How much to jitter

contrast. contrast_factor is chosen uniformly from [max(0, 1 - contrast), 1 + contrast] or the given [min, max]. Should be non negative numbers.

saturation (float or tuple of float (min, max)): How much to jitter

saturation. saturation_factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation] or the given [min, max]. Should be non negative numbers.

hue (float or tuple of float (min, max)): How much to jitter hue.

hue_factor is chosen uniformly from [-hue, hue] or the given [min, max]. Should have 0<= hue <= 0.5 or -0.5 <= min <= max <= 0.5.

_color_jitter(image, this_seed)

Color Jitter Function.

Parameters
  • image (np.ndarray) – Image.

  • this_seed (int) – Seed of torch.

Returns

The output image.

Return type

image (np.ndarray)

transform(results: Dict) Dict

The transform function of ColorJitter.

Parameters

results (dict) – The result dict.

Returns

The result dict.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.RandomAffine(keys, degrees, translate=None, scale=None, shear=None, flip_ratio=None)[source]

Bases: mmcv.transforms.BaseTransform

Apply random affine to input images.

This class is adopted from https://github.com/pytorch/vision/blob/v0.5.0/torchvision/transforms/ transforms.py#L1015 It should be noted that in https://github.com/Yaoyi-Li/GCA-Matting/blob/master/dataloader/ data_generator.py#L70 random flip is added. See explanation of flip_ratio below. Required keys are the keys in attribute “keys”, modified keys are keys in attribute “keys”.

Parameters
  • keys (Sequence[str]) – The images to be affined.

  • degrees (float | tuple[float]) – Range of degrees to select from. If it is a float instead of a tuple like (min, max), the range of degrees will be (-degrees, +degrees). Set to 0 to deactivate rotations.

  • translate (tuple, optional) – Tuple of maximum absolute fraction for horizontal and vertical translations. For example translate=(a, b), then horizontal shift is randomly sampled in the range -img_width * a < dx < img_width * a and vertical shift is randomly sampled in the range -img_height * b < dy < img_height * b. Default: None.

  • scale (tuple, optional) – Scaling factor interval, e.g (a, b), then scale is randomly sampled from the range a <= scale <= b. Default: None.

  • shear (float | tuple[float], optional) – Range of shear degrees to select from. If shear is a float, a shear parallel to the x axis and a shear parallel to the y axis in the range (-shear, +shear) will be applied. Else if shear is a tuple of 2 values, a x-axis shear and a y-axis shear in (shear[0], shear[1]) will be applied. Default: None.

  • flip_ratio (float, optional) – Probability of the image being flipped. The flips in horizontal direction and vertical direction are independent. The image may be flipped in both directions. Default: None.

static _get_params(degrees, translate, scale_ranges, shears, flip_ratio, img_size)

Get parameters for affine transformation.

Returns

Params to be passed to the affine transformation.

Return type

paras (tuple)

static _get_inverse_affine_matrix(center, angle, translate, scale, shear, flip)

Helper method to compute inverse matrix for affine transformation.

As it is explained in PIL.Image.rotate, we need compute INVERSE of affine transformation matrix: M = T * C * RSS * C^-1 where T is translation matrix:

[1, 0, tx | 0, 1, ty | 0, 0, 1];

C is translation matrix to keep center:

[1, 0, cx | 0, 1, cy | 0, 0, 1];

RSS is rotation with scale and shear matrix.

It is different from the original function in torchvision. 1. The order are changed to flip -> scale -> rotation -> shear. 2. x and y have different scale factors. RSS(shear, a, scale, f) =

[ cos(a + shear)*scale_x*f -sin(a + shear)*scale_y 0] [ sin(a)*scale_x*f cos(a)*scale_y 0] [ 0 0 1]

Thus, the inverse is M^-1 = C * RSS^-1 * C^-1 * T^-1.

transform(results)

transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.RandomMaskDilation(keys, binary_thr=0.0, kernel_min=9, kernel_max=49)[source]

Bases: mmcv.transforms.BaseTransform

Randomly dilate binary masks.

Parameters
  • keys (Sequence[str]) – The images to be resized.

  • binary_thr (float) – Threshold for obtaining binary mask. Default: 0.

  • kernel_min (int) – Min size of dilation kernel. Default: 9.

  • kernel_max (int) – Max size of dilation kernel. Default: 49.

_random_dilate(img)
transform(results)

transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.UnsharpMasking(kernel_size, sigma, weight, threshold, keys)[source]

Bases: mmcv.transforms.BaseTransform

Apply unsharp masking to an image or a sequence of images.

Parameters
  • kernel_size (int) – The kernel_size of the Gaussian kernel.

  • sigma (float) – The standard deviation of the Gaussian.

  • weight (float) – The weight of the “details” in the final output.

  • threshold (float) – Pixel differences larger than this value are regarded as “details”.

  • keys (list[str]) – The keys whose values are processed.

Added keys are “xxx_unsharp”, where “xxx” are the attributes specified in “keys”.

_unsharp_masking(imgs)

Unsharp masking function.

transform(results)

transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.Flip(keys, flip_ratio=0.5, direction='horizontal')[source]

Bases: mmcv.transforms.BaseTransform

Flip the input data with a probability.

Reverse the order of elements in the given data with a specific direction. The shape of the data is preserved, but the elements are reordered. Required keys are the keys in attributes “keys”, added or modified keys are “flip”, “flip_direction” and the keys in attributes “keys”. It also supports flipping a list of images with the same flip.

Required Keys:

  • [KEYS]

Modified Keys:

  • [KEYS]

Parameters
  • keys (Union[str, List[str]]) – The images to be flipped.

  • flip_ratio (float) – The probability to flip the images. Default: 0.5.

  • direction (str) – Flip images horizontally or vertically. Options are “horizontal” | “vertical”. Default: “horizontal”.

_directions = ['horizontal', 'vertical']
transform(results)

transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.NumpyPad(keys, padding, **kwargs)[source]

Bases: mmcv.transforms.BaseTransform

Numpy Padding.

In this augmentation, numpy padding is adopted to customize padding augmentation. Please carefully read the numpy manual in: https://numpy.org/doc/stable/reference/generated/numpy.pad.html

If you just hope a single dimension to be padded, you must set padding like this:

padding = ((2, 2), (0, 0), (0, 0))

In this case, if you adopt an input with three dimension, only the first dimension will be padded.

Parameters
  • keys (Union[str, List[str]]) – The images to be padded.

  • padding (int | tuple(int)) – Please refer to the args pad_width in numpy.pad.

transform(results)

Call function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__() str

Return repr(self).

class mmedit.datasets.transforms.RandomRotation(keys, degrees)[source]

Bases: mmcv.transforms.BaseTransform

Rotate the image by a randomly-chosen angle, measured in degree.

Parameters
  • keys (list[str]) – The images to be rotated.

  • degrees (tuple[float] | tuple[int] | float | int) – If it is a tuple, it represents a range (min, max). If it is a float or int, the range is constructed as (-degrees, degrees).

transform(results)

transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.RandomTransposeHW(keys, transpose_ratio=0.5)[source]

Bases: mmcv.transforms.BaseTransform

Randomly transpose images in H and W dimensions with a probability.

(TransposeHW = horizontal flip + anti-clockwise rotation by 90 degrees) When used with horizontal/vertical flips, it serves as a way of rotation augmentation. It also supports randomly transposing a list of images.

Required keys are the keys in attributes “keys”, added or modified keys are “transpose” and the keys in attributes “keys”.

Parameters
  • keys (list[str]) – The images to be transposed.

  • transpose_ratio (float) – The probability to transpose the images. Default: 0.5.

transform(results)

transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.Resize(keys: Union[str, List[str]] = 'img', scale=None, keep_ratio=False, size_factor=None, max_size=None, interpolation='bilinear', backend=None, output_keys=None)[source]

Bases: mmcv.transforms.BaseTransform

Resize data to a specific size for training or resize the images to fit the network input regulation for testing.

When used for resizing images to fit network input regulation, the case is that a network may have several downsample and then upsample operation, then the input height and width should be divisible by the downsample factor of the network. For example, the network would downsample the input for 5 times with stride 2, then the downsample factor is 2^5 = 32 and the height and width should be divisible by 32.

Required keys are the keys in attribute “keys”, added or modified keys are “keep_ratio”, “scale_factor”, “interpolation” and the keys in attribute “keys”.

Required Keys:

  • Required keys are the keys in attribute “keys”

Modified Keys:

  • Modified the keys in attribute “keys” or save as new key ([OUT_KEY])

Added Keys:

  • [OUT_KEY]_shape

  • keep_ratio

  • scale_factor

  • interpolation

All keys in “keys” should have the same shape. “test_trans” is used to record the test transformation to align the input’s shape.

Parameters
  • keys (str | list[str]) – The image(s) to be resized.

  • scale (float | tuple[int]) – If scale is tuple[int], target spatial size (h, w). Otherwise, target spatial size is scaled by input size. Note that when it is used, size_factor and max_size are useless. Default: None

  • keep_ratio (bool) – If set to True, images will be resized without changing the aspect ratio. Otherwise, it will resize images to a given size. Default: False. Note that it is used together with scale.

  • size_factor (int) – Let the output shape be a multiple of size_factor. Default:None. Note that when it is used, scale should be set to None and keep_ratio should be set to False.

  • max_size (int) – The maximum size of the longest side of the output. Default:None. Note that it is used together with size_factor.

  • interpolation (str) – Algorithm used for interpolation: “nearest” | “bilinear” | “bicubic” | “area” | “lanczos”. Default: “bilinear”.

  • backend (str | None) – The image resize backend type. Options are cv2, pillow, None. If backend is None, the global imread_backend specified by mmcv.use_backend() will be used. Default: None.

  • output_keys (list[str] | None) – The resized images. Default: None Note that if it is not None, its length should be equal to keys.

_resize(img)

Resize function.

Parameters

img (np.ndarray) – Image.

Returns

Resized image.

Return type

img (np.ndarray)

transform(results: Dict) Dict

Transform function to resize images.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.CenterCropLongEdge(keys='img')[source]

Bases: mmcv.transforms.BaseTransform

Center crop the given image by the long edge.

Parameters

keys (list[str]) – The images to be cropped.

transform(results)

Call function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.Crop(keys, crop_size, random_crop=True, is_pad_zeros=False)[source]

Bases: mmcv.transforms.BaseTransform

Crop data to specific size for training.

Parameters
  • keys (Sequence[str]) – The images to be cropped.

  • crop_size (Tuple[int]) – Target spatial size (h, w).

  • random_crop (bool) – If set to True, it will random crop image. Otherwise, it will work as center crop. Default: True.

  • is_pad_zeros (bool, optional) – Whether to pad the image with 0 if crop_size is greater than image size. Default: False.

_crop(data)
transform(results)

Transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.CropAroundCenter(crop_size)[source]

Bases: mmcv.transforms.BaseTransform

Randomly crop the images around unknown area in the center 1/4 images.

This cropping strategy is adopted in GCA matting. The unknown area is the same as semi-transparent area. https://arxiv.org/pdf/2001.04069.pdf

It retains the center 1/4 images and resizes the images to ‘crop_size’. Required keys are “fg”, “bg”, “trimap” and “alpha”, added or modified keys are “crop_bbox”, “fg”, “bg”, “trimap” and “alpha”.

Parameters

crop_size (int | tuple) – Desired output size. If int, square crop is applied.

transform(results)

Transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.CropAroundFg(keys, bd_ratio_range=(0.1, 0.4), test_mode=False)[source]

Bases: mmcv.transforms.BaseTransform

Crop around the whole foreground in the segmentation mask.

Required keys are “seg” and the keys in argument keys. Meanwhile, “seg” must be in argument keys. Added or modified keys are “crop_bbox” and the keys in argument keys.

Parameters
  • keys (Sequence[str]) – The images to be cropped. It must contain ‘seg’.

  • bd_ratio_range (tuple, optional) – The range of the boundary (bd) ratio to select from. The boundary ratio is the ratio of the boundary to the minimal bbox that contains the whole foreground given by segmentation. Default to (0.1, 0.4).

  • test_mode (bool) – Whether use test mode. In test mode, the tight crop area of foreground will be extended to the a square. Default to False.

transform(results)

Transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

class mmedit.datasets.transforms.CropAroundUnknown(keys, crop_sizes, unknown_source='alpha', interpolations='bilinear')[source]

Bases: mmcv.transforms.BaseTransform

Crop around unknown area with a randomly selected scale.

Randomly select the w and h from a list of (w, h). Required keys are the keys in argument keys, added or modified keys are “crop_bbox” and the keys in argument keys. This class assumes value of “alpha” ranges from 0 to 255.

Parameters
  • keys (Sequence[str]) – The images to be cropped. It must contain ‘alpha’. If unknown_source is set to ‘trimap’, then it must also contain ‘trimap’.

  • crop_sizes (list[int | tuple[int]]) – List of (w, h) to be selected.

  • unknown_source (str, optional) – Unknown area to select from. It must be ‘alpha’ or ‘trimap’. Default to ‘alpha’.

  • interpolations (str | list[str], optional) – Interpolation method of mmcv.imresize. The interpolation operation will be applied when image size is smaller than the crop_size. If given as a list of str, it should have the same length as keys. Or if given as a str all the keys will be resized with the same method. Default to ‘bilinear’.

transform(results)

Transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.CropLike(target_key, reference_key=None)[source]

Bases: mmcv.transforms.BaseTransform

Crop/pad the image in the target_key according to the size of image in the reference_key .

Parameters
  • target_key (str) – The key needs to be cropped.

  • reference_key (str | None) – The reference key, need its size. Default: None.

transform(results)

Transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation. Require self.target_key and self.reference_key.

Returns

A dict containing the processed data and information.

Modify self.target_key.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.FixedCrop(keys, crop_size, crop_pos=None)[source]

Bases: mmcv.transforms.BaseTransform

Crop paired data (at a specific position) to specific size for training.

Parameters
  • keys (Sequence[str]) – The images to be cropped.

  • crop_size (Tuple[int]) – Target spatial size (h, w).

  • crop_pos (Tuple[int]) – Specific position (x, y). If set to None, random initialize the position to crop paired data batch. Default: None.

_crop(data, x_offset, y_offset, crop_w, crop_h)
transform(results)

Transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.InstanceCrop(config_file, key='img', box_num_upbound=- 1, finesize=256)[source]

Bases: mmcv.transforms.BaseTransform

Use maskrcnn to detect instances on image.

Mask R-CNN is used to detect the instance on the image pred_bbox is used to segment the instance on the image

Parameters
  • config_file (str) – config file name relative to detectron2’s “configs/”

  • key (str) – Unused

  • box_num_upbound (int) – The upper limit on the number of instances in the figure

transform(results: dict) dict

The transform function of InstanceCrop.

Parameters

results (dict) – A dict containing the necessary information and data for Conversion

Returns

A dict containing the processed data

and information.

Return type

results (dict)

predict_bbox(image)
class mmedit.datasets.transforms.ModCrop(key='gt')[source]

Bases: mmcv.transforms.BaseTransform

Mod crop images, used during testing.

Required keys are “scale” and “KEY”, added or modified keys are “KEY”.

Parameters

key (str) – The key of image. Default: ‘gt’

transform(results)

Transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.PairedRandomCrop(gt_patch_size, lq_key='img', gt_key='gt')[source]

Bases: mmcv.transforms.BaseTransform

Paried random crop.

It crops a pair of img and gt images with corresponding locations. It also supports accepting img list and gt list. Required keys are “scale”, “lq_key”, and “gt_key”, added or modified keys are “lq_key” and “gt_key”.

Parameters
  • gt_patch_size (int) – cropped gt patch size.

  • lq_key (str) – Key of LQ img. Default: ‘img’.

  • gt_key (str) – Key of GT img. Default: ‘gt’.

transform(results)

Transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.RandomCropLongEdge(keys='img')[source]

Bases: mmcv.transforms.BaseTransform

Random crop the given image by the long edge.

Parameters

keys (list[str]) – The images to be cropped.

transform(results)

Call function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.RandomResizedCrop(keys, crop_size, scale=(0.08, 1.0), ratio=(3.0 / 4.0, 4.0 / 3.0), interpolation='bilinear')[source]

Bases: mmcv.transforms.BaseTransform

Crop data to random size and aspect ratio.

A crop of a random proportion of the original image and a random aspect ratio of the original aspect ratio is made. The cropped image is finally resized to a given size specified by ‘crop_size’. Modified keys are the attributes specified in “keys”.

This code is partially adopted from torchvision.transforms.RandomResizedCrop: [https://pytorch.org/vision/stable/_modules/torchvision/transforms/ transforms.html#RandomResizedCrop].

Parameters
  • keys (list[str]) – The images to be resized and random-cropped.

  • crop_size (int | tuple[int]) – Target spatial size (h, w).

  • scale (tuple[float], optional) – Range of the proportion of the original image to be cropped. Default: (0.08, 1.0).

  • ratio (tuple[float], optional) – Range of aspect ratio of the crop. Default: (3. / 4., 4. / 3.).

  • interpolation (str, optional) – Algorithm used for interpolation. It can be only either one of the following: “nearest” | “bilinear” | “bicubic” | “area” | “lanczos”. Default: “bilinear”.

get_params(data)

Get parameters for a random sized crop.

Parameters

data (np.ndarray) – Image of type numpy array to be cropped.

Returns

A tuple containing the coordinates of the top left corner and the chosen crop size.

transform(results)

Transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.CompositeFg(fg_dirs, alpha_dirs, interpolation='nearest')[source]

Bases: mmcv.transforms.BaseTransform

Composite foreground with a random foreground.

This class composites the current training sample with additional data randomly (could be from the same dataset). With probability 0.5, the sample will be composited with a random sample from the specified directory. The composition is performed as:

\[ \begin{align}\begin{aligned}fg_{new} = \alpha_1 * fg_1 + (1 - \alpha_1) * fg_2\\\alpha_{new} = 1 - (1 - \alpha_1) * (1 - \alpha_2)\end{aligned}\end{align} \]

where \((fg_1, \alpha_1)\) is from the current sample and \((fg_2, \alpha_2)\) is the randomly loaded sample. With the above composition, \(\alpha_{new}\) is still in [0, 1].

Required keys are “alpha” and “fg”. Modified keys are “alpha” and “fg”.

Parameters
  • fg_dirs (str | list[str]) – Path of directories to load foreground images from.

  • alpha_dirs (str | list[str]) – Path of directories to load alpha mattes from.

  • interpolation (str) – Interpolation method of mmcv.imresize to resize the randomly loaded images. Default: ‘nearest’.

transform(results: dict) dict

Transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

_get_file_list(fg_dirs, alpha_dirs)
__repr__()

Return repr(self).

class mmedit.datasets.transforms.MergeFgAndBg[source]

Bases: mmcv.transforms.BaseTransform

Composite foreground image and background image with alpha.

Required keys are “alpha”, “fg” and “bg”, added key is “merged”.

transform(results: dict) dict

Transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__() str

Return repr(self).

class mmedit.datasets.transforms.PerturbBg(gamma_ratio=0.6)[source]

Bases: mmcv.transforms.BaseTransform

Randomly add gaussian noise or gamma change to background image.

Required key is “bg”, added key is “noisy_bg”.

Parameters

gamma_ratio (float, optional) – The probability to use gamma correction instead of gaussian noise. Defaults to 0.6.

transform(results: dict) dict

Transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.RandomJitter(hue_range=40)[source]

Bases: mmcv.transforms.BaseTransform

Randomly jitter the foreground in hsv space.

The jitter range of hue is adjustable while the jitter ranges of saturation and value are adaptive to the images. Side effect: the “fg” image will be converted to np.float32. Required keys are “fg” and “alpha”, modified key is “fg”.

Parameters

hue_range (float | tuple[float]) – Range of hue jittering. If it is a float instead of a tuple like (min, max), the range of hue jittering will be (-hue_range, +hue_range). Default: 40.

transform(results)

transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.RandomLoadResizeBg(bg_dir, flag='color', channel_order='bgr')[source]

Bases: mmcv.transforms.BaseTransform

Randomly load a background image and resize it.

Required key is “fg”, added key is “bg”.

Parameters
  • bg_dir (str) – Path of directory to load background images from.

  • flag (str) – Loading flag for images. Default: ‘color’.

  • channel_order (str) – Order of channel, candidates are ‘bgr’ and ‘rgb’. Default: ‘bgr’.

  • kwargs (dict) – Args for file client.

transform(results: dict) dict

Transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.PackEditInputs(keys: Tuple[List[str], str, None] = None, pack_all: bool = False)[source]

Bases: mmcv.transforms.base.BaseTransform

Pack the inputs data for SR, VFI, matting and inpainting.

Keys for images include img, gt, ref, mask, gt_heatmap,

trimap, gt_alpha, gt_fg, gt_bg. All of them will be packed into data field of EditDataSample.

pack_all (bool): Whether pack all variables in results to inputs dict.

This is useful when keys of the input dict is not fixed. Please be careful when using this function, because we do not Defaults to False.

Others will be packed into metainfo field of EditDataSample.

transform(results: dict) dict

Method to pack the input data.

Parameters

results (dict) – Result dict from the data pipeline.

Returns

  • ‘inputs’ (obj:torch.Tensor): The forward data of models.

  • ’data_samples’ (obj:EditDataSample): The annotation info of the

    sample.

Return type

dict

__repr__() str

Return repr(self).

class mmedit.datasets.transforms.ToTensor(keys, to_float32=True)[source]

Bases: mmcv.transforms.base.BaseTransform

Convert some values in results dict to torch.Tensor type in data loader pipeline.

Parameters
  • keys (Sequence[str]) – Required keys to be converted.

  • to_float32 (bool) – Whether convert tensors of images to float32. Default: True.

_data_to_tensor(value)

Convert the value to tensor.

transform(results)

transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.GenerateCoordinateAndCell(sample_quantity=None, scale=None, target_size=None, reshape_gt=True)[source]

Bases: mmcv.transforms.base.BaseTransform

Generate coordinate and cell. Generate coordinate from the desired size of SR image.

Train or val:

  1. Generate coordinate from GT.

#. Reshape GT image to (HgWg, 3) and transpose to (3, HgWg). where Hg and Wg represent the height and width of GT.

Test:

  1. Generate coordinate from LQ and scale or target_size.

  2. Then generate cell from coordinate.

Parameters
  • sample_quantity (int | None) – The quantity of samples in coordinates. To ensure that the GT tensors in a batch have the same dimensions. Default: None.

  • scale (float) – Scale of upsampling. Default: None.

  • target_size (tuple[int]) – Size of target image. Default: None.

  • reshape_gt (bool) – Whether reshape gt to (-1, 3). Default: True If sample_quantity is not None, reshape_gt = True.

The priority of getting ‘size of target image’ is:

  1. results[‘gt’].shape[-2:]

  2. results[‘lq’].shape[-2:] * scale

  3. target_size

transform(results)

Call function.

Parameters
  • results (Require either in) – A dict containing the necessary information

  • augmentation. (and data for) –

  • results

  • 'lq' (1.) –

  • 'gt' (2.) –

  • None (3.) –

  • and (the premise is self.target_size) –

  • len (self.target_size) –

Returns

A dict containing the processed data and information. Reshape ‘gt’ to (-1, 3) and transpose to (3, -1) if ‘gt’ in results. Add ‘coord’ and ‘cell’.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.GenerateFacialHeatmap(image_key, ori_size, target_size, sigma=1.0, use_cache=True)[source]

Bases: mmcv.transforms.base.BaseTransform

Generate heatmap from keypoint.

Parameters
  • image_key (str) – Key of facial image in dict.

  • ori_size (int | Tuple[int]) – Original image size of keypoint.

  • target_size (int | Tuple[int]) – Target size of heatmap.

  • sigma (float) – Sigma parameter of heatmap. Default: 1.0

  • use_cache (bool) – If True, load all heatmap at once. Default: True.

transform(results)

transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation. Require keypoint.

Returns

A dict containing the processed data and information.

Add ‘heatmap’.

Return type

dict

generate_heatmap_from_img(image)

Generate heatmap from img.

Parameters

image (np.ndarray) – Face image.

results:

heatmap (np.ndarray): Heatmap the face image.

_face_alignment_detector(image)

Generate face landmark by face_alignment.

Parameters

image (np.ndarray) – Face image.

Returns

Location of landmark.

Return type

landmark (Tuple[float])

_generate_one_heatmap(keypoint)

Generate One Heatmap.

Parameters

keypoint (Tuple[float]) – Location of a landmark.

results:

heatmap (np.ndarray): A heatmap of landmark.

__repr__()

Return repr(self).

class mmedit.datasets.transforms.GenerateFrameIndices(interval_list, frames_per_clip=99)[source]

Bases: mmcv.transforms.BaseTransform

Generate frame index for REDS datasets. It also performs temporal augmention with random interval.

Required Keys:

  • img_path

  • gt_path

  • key

  • num_input_frames

Modified Keys:

  • img_path

  • gt_path

Added Keys:

  • interval

  • reverse

Parameters
  • interval_list (list[int]) – Interval list for temporal augmentation. It will randomly pick an interval from interval_list and sample frame index with the interval.

  • frames_per_clip (int) – Number of frames per clips. Default: 99 for REDS dataset.

transform(results)

transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.GenerateFrameIndiceswithPadding(padding, filename_tmpl='{:08d}')[source]

Bases: mmcv.transforms.BaseTransform

Generate frame index with padding for REDS dataset and Vid4 dataset during testing.

Required Keys:

  • img_path

  • gt_path

  • key

  • num_input_frames

  • sequence_length

Modified Keys:

  • img_path

  • gt_path

Parameters

padding

padding mode, one of ‘replicate’ | ‘reflection’ | ‘reflection_circle’ | ‘circle’.

Examples: current_idx = 0, num_input_frames = 5 The generated frame indices under different padding mode:

replicate: [0, 0, 0, 1, 2] reflection: [2, 1, 0, 1, 2] reflection_circle: [4, 3, 0, 1, 2] circle: [3, 4, 0, 1, 2]

transform(results)

transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.GenerateSegmentIndices(interval_list, start_idx=0, filename_tmpl='{:08d}.png')[source]

Bases: mmcv.transforms.BaseTransform

Generate frame indices for a segment. It also performs temporal augmention with random interval.

Required Keys:

  • img_path

  • gt_path

  • key

  • num_input_frames

  • sequence_length

Modified Keys:

  • img_path

  • gt_path

Added Keys:

  • interval

  • reverse

Parameters
  • interval_list (list[int]) – Interval list for temporal augmentation. It will randomly pick an interval from interval_list and sample frame index with the interval.

  • start_idx (int) – The index corresponds to the first frame in the sequence. Default: 0.

  • filename_tmpl (str) – Template for file name. Default: ‘{:08d}.png’.

transform(results)

transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.GetMaskedImage(img_key='gt', mask_key='mask', out_key='img', zero_value=127.5)[source]

Bases: mmcv.transforms.base.BaseTransform

Get masked image.

Parameters
  • img_key (str) – Key for clean image. Default: ‘gt’.

  • mask_key (str) – Key for mask image. The mask shape should be (h, w, 1) while ‘1’ indicate holes and ‘0’ indicate valid regions. Default: ‘mask’.

  • img_key – Key for output image. Default: ‘img’.

  • zero_value (float) – Pixel value of masked area.

transform(results)

transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.GetSpatialDiscountMask(gamma=0.99, beta=1.5)[source]

Bases: mmcv.transforms.BaseTransform

Get spatial discounting mask constant.

Spatial discounting mask is first introduced in: Generative Image Inpainting with Contextual Attention.

Parameters
  • gamma (float, optional) – Gamma for computing spatial discounting. Defaults to 0.99.

  • beta (float, optional) – Beta for computing spatial discounting. Defaults to 1.5.

spatial_discount_mask(mask_width, mask_height)

Generate spatial discounting mask constant.

Parameters
  • mask_width (int) – The width of bbox hole.

  • mask_height (int) – The height of bbox height.

Returns

Spatial discounting mask.

Return type

np.ndarray

transform(results)

transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.LoadImageFromFile(key: str, color_type: str = 'color', channel_order: str = 'bgr', imdecode_backend: Optional[str] = None, use_cache: bool = False, to_float32: bool = False, to_y_channel: bool = False, save_original_img: bool = False, file_client_args: Optional[dict] = None)[source]

Bases: mmcv.transforms.BaseTransform

Load a single image or image frames from corresponding paths. Required Keys: - [Key]_path

New Keys: - [KEY] - ori_[KEY]_shape - ori_[KEY]

Parameters
  • key (str) – Keys in results to find corresponding path.

  • color_type (str) – The flag argument for :func:mmcv.imfrombytes. Defaults to ‘color’.

  • channel_order (str) – Order of channel, candidates are ‘bgr’ and ‘rgb’. Default: ‘bgr’.

  • imdecode_backend (str) – The image decoding backend type. The backend argument for :func:mmcv.imfrombytes. See :func:mmcv.imfrombytes for details. candidates are ‘cv2’, ‘turbojpeg’, ‘pillow’, and ‘tifffile’. Defaults to None.

  • use_cache (bool) – If True, load all images at once. Default: False.

  • to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.

  • to_y_channel (bool) – Whether to convert the loaded image to y channel. Only support ‘rgb2ycbcr’ and ‘rgb2ycbcr’ Defaults to False.

  • file_client_args (dict) – Arguments to instantiate a FileClient. If not specified, will infer from file uri. See mmengine.fileio.FileClient for details. Defaults to None.

transform(results: dict) dict

Functions to load image or frames.

Parameters

results (dict) – Result dict from :obj:mmcv.BaseDataset.

Returns

The dict contains loaded image and meta information.

Return type

dict

_load_image(filename)

Load an image from file.

Parameters

filename (str) – Path of image file.

Returns

Image.

Return type

np.ndarray

_convert(img: numpy.ndarray)

Convert an image to the require format.

Parameters

img (np.ndarray) – The original image.

Returns

The converted image.

Return type

np.ndarray

__repr__()

Return repr(self).

class mmedit.datasets.transforms.LoadMask(mask_mode='bbox', mask_config=None)[source]

Bases: mmcv.transforms.BaseTransform

Load Mask for multiple types.

For different types of mask, users need to provide the corresponding config dict.

Example config for bbox:

config = dict(img_shape=(256, 256), max_bbox_shape=128)

Example config for irregular:

config = dict(
    img_shape=(256, 256),
    num_vertices=(4, 12),
    max_angle=4.,
    length_range=(10, 100),
    brush_width=(10, 40),
    area_ratio_range=(0.15, 0.5))

Example config for ff:

config = dict(
    img_shape=(256, 256),
    num_vertices=(4, 12),
    mean_angle=1.2,
    angle_range=0.4,
    brush_width=(12, 40))

Example config for set:

config = dict(
    mask_list_file='xxx/xxx/ooxx.txt',
    prefix='/xxx/xxx/ooxx/',
    io_backend='disk',
    color_type='unchanged',
    file_client_kwargs=dict()
)

The mask_list_file contains the list of mask file name like this:
    test1.jpeg
    test2.jpeg
    ...
    ...

The prefix gives the data path.
Parameters
  • mask_mode (str) – Mask mode in [‘bbox’, ‘irregular’, ‘ff’, ‘set’, ‘file’]. Default: ‘bbox’. * bbox: square bounding box masks. * irregular: irregular holes. * ff: free-form holes from DeepFillv2. * set: randomly get a mask from a mask set. * file: get mask from ‘mask_path’ in results.

  • mask_config (dict) – Params for creating masks. Each type of mask needs different configs. Default: None.

_init_info()
_get_random_mask_from_set()
_get_mask_from_file(path)
transform(results)

transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.LoadPairedImageFromFile(key: str, domain_a: str = 'A', domain_b: str = 'B', color_type: str = 'color', channel_order: str = 'bgr', imdecode_backend: Optional[str] = None, use_cache: bool = False, to_float32: bool = False, to_y_channel: bool = False, save_original_img: bool = False, file_client_args: Optional[dict] = None)[source]

Bases: LoadImageFromFile

Load a pair of images from file.

Each sample contains a pair of images, which are concatenated in the w dimension (a|b). This is a special loading class for generation paired dataset. It loads a pair of images as the common loader does and crops it into two images with the same shape in different domains.

Required key is “pair_path”. Added or modified keys are “pair”, “pair_ori_shape”, “ori_pair”, “img_{domain_a}”, “img_{domain_b}”, “img_{domain_a}_path”, “img_{domain_b}_path”, “img_{domain_a}_ori_shape”, “img_{domain_b}_ori_shape”, “ori_img_{domain_a}” and “ori_img_{domain_b}”.

Parameters
  • key (str) – Keys in results to find corresponding path.

  • domain_a (str, Optional) – One of the paired image domain. Defaults to ‘A’.

  • domain_b (str, Optional) – The other of the paired image domain. Defaults to ‘B’.

  • color_type (str) – The flag argument for :func:mmcv.imfrombytes. Defaults to ‘color’.

  • channel_order (str) – Order of channel, candidates are ‘bgr’ and ‘rgb’. Default: ‘bgr’.

  • imdecode_backend (str) – The image decoding backend type. The backend argument for :func:mmcv.imfrombytes. See :func:mmcv.imfrombytes for details. candidates are ‘cv2’, ‘turbojpeg’, ‘pillow’, and ‘tifffile’. Defaults to None.

  • use_cache (bool) – If True, load all images at once. Default: False.

  • to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.

  • to_y_channel (bool) – Whether to convert the loaded image to y channel. Only support ‘rgb2ycbcr’ and ‘rgb2ycbcr’ Defaults to False.

  • file_client_args (dict) – Arguments to instantiate a FileClient. If not specified, will infer from file uri. See mmengine.fileio.FileClient for details. Defaults to None.

  • io_backend (str, optional) – io backend where images are store. Defaults to None.

transform(results: dict) dict

Functions to load paired images.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

class mmedit.datasets.transforms.MATLABLikeResize(keys, scale=None, output_shape=None, kernel='bicubic', kernel_width=4.0)[source]

Bases: mmcv.transforms.BaseTransform

Resize the input image using MATLAB-like downsampling.

Currently support bicubic interpolation only. Note that the output of this function is slightly different from the official MATLAB function.

Required keys are the keys in attribute “keys”. Added or modified keys are “scale” and “output_shape”, and the keys in attribute “keys”.

Parameters
  • keys (list[str]) – A list of keys whose values are modified.

  • scale (float | None, optional) – The scale factor of the resize operation. If None, it will be determined by output_shape. Default: None.

  • output_shape (tuple(int) | None, optional) – The size of the output image. If None, it will be determined by scale. Note that if scale is provided, output_shape will not be used. Default: None.

  • kernel (str, optional) – The kernel for the resize operation. Currently support ‘bicubic’ only. Default: ‘bicubic’.

  • kernel_width (float) – The kernel width. Currently support 4.0 only. Default: 4.0.

_resize(img)

resize an image to the require size.

Parameters

img (np.ndarray) – The original image.

Returns

The resized image.

Return type

output (np.ndarray)

transform(results)

transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.Normalize(keys, mean, std, to_rgb=False, save_original=False)[source]

Bases: mmcv.transforms.BaseTransform

Normalize images with the given mean and std value.

Required keys are the keys in attribute “keys”, added or modified keys are the keys in attribute “keys” and these keys with postfix ‘_norm_cfg’. It also supports normalizing a list of images.

Parameters
  • keys (Sequence[str]) – The images to be normalized.

  • mean (np.ndarray) – Mean values of different channels.

  • std (np.ndarray) – Std values of different channels.

  • to_rgb (bool) – Whether to convert channels from BGR to RGB. Default: False.

  • save_original (bool) – Whether to save original images. Default: False.

transform(results)

transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.RescaleToZeroOne(keys)[source]

Bases: mmcv.transforms.BaseTransform

Transform the images into a range between 0 and 1.

Required keys are the keys in attribute “keys”, added or modified keys are the keys in attribute “keys”. It also supports rescaling a list of images.

Parameters

keys (Sequence[str]) – The images to be transformed.

transform(results)

transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.DegradationsWithShuffle(degradations, keys, shuffle_idx=None)[source]

Apply random degradations to input, with degradations being shuffled.

Degradation groups are supported. The order of degradations within the same group is preserved. For example, if we have degradations = [a, b, [c, d]] and shuffle_idx = None, then the possible orders are

[a, b, [c, d]]
[a, [c, d], b]
[b, a, [c, d]]
[b, [c, d], a]
[[c, d], a, b]
[[c, d], b, a]

Modified keys are the attributed specified in “keys”.

Parameters
  • degradations (list[dict]) – The list of degradations.

  • keys (list[str]) – A list specifying the keys whose values are modified.

  • shuffle_idx (list | None, optional) – The degradations corresponding to these indices are shuffled. If None, all degradations are shuffled. Default: None.

_build_degradations(degradations)
__call__(results)
__repr__()

Return repr(self).

class mmedit.datasets.transforms.RandomBlur(params, keys)[source]

Apply random blur to the input.

Modified keys are the attributed specified in “keys”.

Parameters
  • params (dict) – A dictionary specifying the degradation settings.

  • keys (list[str]) – A list specifying the keys whose values are modified.

get_kernel(num_kernels: int)

This is the function to create kernel.

Parameters

num_kernels (int) – the number of kernels

Returns

_description_

Return type

_type_

_apply_random_blur(imgs)

This is the function to apply blur operation on images.

Parameters

imgs (Tensor) – images

Returns

Images applied blur

Return type

Tensor

__call__(results)
__repr__()

Return repr(self).

class mmedit.datasets.transforms.RandomJPEGCompression(params, keys)[source]

Apply random JPEG compression to the input.

Modified keys are the attributed specified in “keys”.

Parameters
  • params (dict) – A dictionary specifying the degradation settings.

  • keys (list[str]) – A list specifying the keys whose values are modified.

_apply_random_compression(imgs)
__call__(results)
__repr__()

Return repr(self).

class mmedit.datasets.transforms.RandomNoise(params, keys)[source]

Apply random noise to the input.

Currently support Gaussian noise and Poisson noise.

Modified keys are the attributed specified in “keys”.

Parameters
  • params (dict) – A dictionary specifying the degradation settings.

  • keys (list[str]) – A list specifying the keys whose values are modified.

_apply_gaussian_noise(imgs)

This is the function used to apply gaussian noise on images.

Parameters

imgs (Tensor) – images

Returns

images applied gaussian noise

Return type

Tensor

_apply_poisson_noise(imgs)
_apply_random_noise(imgs)

This is the function used to apply random noise on images.

Parameters

imgs (Tensor) – training images

Returns

_description_

Return type

_type_

__call__(results)
__repr__()

Return repr(self).

class mmedit.datasets.transforms.RandomResize(params, keys)[source]

Randomly resize the input.

Modified keys are the attributed specified in “keys”.

Parameters
  • params (dict) – A dictionary specifying the degradation settings.

  • keys (list[str]) – A list specifying the keys whose values are modified.

_random_resize(imgs)

This is the function used to randomly resize images for training augmentation.

Parameters

imgs (Tensor) – training images.

Returns

images after radomly resized

Return type

Tensor

__call__(results)
__repr__()

Return repr(self).

class mmedit.datasets.transforms.RandomVideoCompression(params, keys)[source]

Apply random video compression to the input.

Modified keys are the attributed specified in “keys”.

Parameters
  • params (dict) – A dictionary specifying the degradation settings.

  • keys (list[str]) – A list specifying the keys whose values are modified.

_apply_random_compression(imgs)

This is the function to apply random compression on images.

Parameters

imgs (Tensor) – training images

Returns

images after randomly compressed

Return type

Tensor

__call__(results)
__repr__()

Return repr(self).

class mmedit.datasets.transforms.RandomDownSampling(scale_min=1.0, scale_max=4.0, patch_size=None, interpolation='bicubic', backend='pillow')[source]

Bases: mmcv.transforms.BaseTransform

Generate LQ image from GT (and crop), which will randomly pick a scale.

Parameters
  • scale_min (float) – The minimum of upsampling scale, inclusive. Default: 1.0.

  • scale_max (float) – The maximum of upsampling scale, exclusive. Default: 4.0.

  • patch_size (int) – The cropped lr patch size. Default: None, means no crop.

  • interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear”, “bicubic”, “box”, “lanczos”, “hamming” for ‘pillow’ backend. Default: “bicubic”.

  • backend (str | None) – The image resize backend type. Options are cv2, pillow, None. If backend is None, the global imread_backend specified by mmcv.use_backend() will be used. Default: “pillow”.

  • [scale_min (Scale will be picked in the range of) –

  • scale_max).

transform(results)

transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation. ‘gt’ is required.

Returns

A dict containing the processed data and information.

modified ‘gt’, supplement ‘lq’ and ‘scale’ to keys.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.FormatTrimap(to_onehot=False)[source]

Bases: mmcv.transforms.BaseTransform

Convert trimap (tensor) to one-hot representation.

It transforms the trimap label from (0, 128, 255) to (0, 1, 2). If to_onehot is set to True, the trimap will convert to one-hot tensor of shape (3, H, W). Required key is “trimap”, added or modified key are “trimap” and “format_trimap_to_onehot”.

Parameters

to_onehot (bool) – whether convert trimap to one-hot tensor. Default: False.

transform(results)

Transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.GenerateTrimap(kernel_size, iterations=1, random=True)[source]

Bases: mmcv.transforms.BaseTransform

Using random erode/dilate to generate trimap from alpha matte.

Required key is “alpha”, added key is “trimap”.

Parameters
  • kernel_size (int | tuple[int]) – The range of random kernel_size of erode/dilate; int indicates a fixed kernel_size. If random is set to False and kernel_size is a tuple of length 2, then it will be interpreted as (erode kernel_size, dilate kernel_size). It should be noted that the kernel of the erosion and dilation has the same height and width.

  • iterations (int | tuple[int], optional) – The range of random iterations of erode/dilate; int indicates a fixed iterations. If random is set to False and iterations is a tuple of length 2, then it will be interpreted as (erode iterations, dilate iterations). Default to 1.

  • random (bool, optional) – Whether use random kernel_size and iterations when generating trimap. See kernel_size and iterations for more information. Default to True.

transform(results: dict) dict

Transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.GenerateTrimapWithDistTransform(dist_thr=20, random=True)[source]

Bases: mmcv.transforms.BaseTransform

Generate trimap with distance transform function.

Parameters
  • dist_thr (int, optional) – Distance threshold. Area with alpha value between (0, 255) will be considered as initial unknown area. Then area with distance to unknown area smaller than the distance threshold will also be consider as unknown area. Defaults to 20.

  • random (bool, optional) – If True, use random distance threshold from [1, dist_thr). If False, use dist_thr as the distance threshold directly. Defaults to True.

transform(results: dict) dict

Transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.TransformTrimap[source]

Bases: mmcv.transforms.BaseTransform

Transform trimap into two-channel and six-channel.

This class will generate a two-channel trimap composed of definite foreground and background masks and encode it into a six-channel trimap using Gaussian blurs of the generated two-channel trimap at three different scales. The transformed trimap has 6 channels.

Required key is “trimap”, added key is “transformed_trimap” and “two_channel_trimap”.

Adopted from the following repository: https://github.com/MarcoForte/FBA_Matting/blob/master/networks/transforms.py.

transform(results: dict) dict

Transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict containing the processed data and information.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.CopyValues(src_keys, dst_keys)[source]

Bases: mmcv.transforms.BaseTransform

Copy the value of source keys to destination keys.

# TODO Change to dict(dst=src)

It does the following: results[dst_key] = results[src_key] for (src_key, dst_key) in zip(src_keys, dst_keys).

Added keys are the keys in the attribute “dst_keys”.

Required Keys:

  • [SRC_KEYS]

Added Keys:

  • [DST_KEYS]

Parameters
  • src_keys (list[str]) – The source keys.

  • dst_keys (list[str]) – The destination keys.

transform(results)

transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict with a key added/modified.

Return type

dict

__repr__()

Return repr(self).

class mmedit.datasets.transforms.SetValues(dictionary)[source]

Bases: mmcv.transforms.BaseTransform

Set value to destination keys.

It does the following: results[key] = value

Added keys are the keys in the dictionary.

Required Keys:

  • None

Added or Modifyed Keys:

  • keys in the dictionary

Parameters

dictionary (dict) – The dictionary to update.

transform(results: Dict)

transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict with a key added/modified.

Return type

dict

__repr__()

Return repr(self).

Read the Docs v: zyh/api-rendering
Versions
master
latest
stable
zyh-doc-notfound-extend
zyh-api-rendering
Downloads
pdf
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.