`mmedit.models.base_archs`¶

Package Contents¶

Classes¶

`AllGatherLayer`	All gather layer with backward propagation path.
`ASPP`	ASPP module from DeepLabV3.
`SpatialTemporalEnsemble`	Apply spatial and temporal ensemble and compute outputs.
`SimpleGatedConvModule`	Simple Gated Convolutional Module.
`ImgNormalize`	Normalize images with the given mean and std value.
`LinearModule`	A linear block that contains linear/norm/activation layers.
`MultiLayerDiscriminator`	Multilayer Discriminator.
`PatchDiscriminator`	A PatchGAN discriminator.
`ResNet`	General ResNet.
`DepthwiseSeparableConvModule`	Depthwise separable convolution module.
`SimpleEncoderDecoder`	Simple encoder-decoder model from matting.
`SoftMaskPatchDiscriminator`	A Soft Mask-Guided PatchGAN discriminator.
`ResidualBlockNoBN`	Residual block without BN.
`PixelShufflePack`	Pixel Shuffle upsample layer.
`VGG16`	Customized VGG16 Encoder.

Functions¶

`conv2d`(input, weight[, bias, stride, padding, ...])
`conv_transpose2d`(input, weight[, bias, stride, ...])
`pixel_unshuffle`(→ torch.Tensor)	Down-sample by pixel unshuffle.

class mmedit.models.base_archs.AllGatherLayer(*args, **kwargs)[source]¶

Bases: torch.autograd.Function

All gather layer with backward propagation path.

Indeed, this module is to make dist.all_gather() in the backward graph. Such kind of operation has been widely used in Moco and other contrastive learning algorithms.

static forward(ctx, x)[source]¶: Forward function.

static backward(ctx, *grad_outputs)[source]¶: Backward function.

class mmedit.models.base_archs.ASPP(in_channels: int, out_channels: int = 256, mid_channels: int = 256, dilations: Sequence[int] = (12, 24, 36), conv_cfg: Optional[dict] = None, norm_cfg: Optional[dict] = dict(type='BN'), act_cfg: Optional[dict] = dict(type='ReLU'), separable_conv: bool = False)[source]¶

Bases: torch.nn.Module

ASPP module from DeepLabV3.

The code is adopted from https://github.com/pytorch/vision/blob/master/torchvision/models/ segmentation/deeplabv3.py

For more information about the module: “Rethinking Atrous Convolution for Semantic Image Segmentation”.

Parameters

in_channels (int) – Input channels of the module.
out_channels (int) – Output channels of the module. Default: 256.
mid_channels (int) – Output channels of the intermediate ASPP conv modules. Default: 256.
dilations (Sequence[int]) – Dilation rate of three ASPP conv module. Default: [12, 24, 36].
conv_cfg (dict) – Config dict for convolution layer. If “None”, nn.Conv2d will be applied. Default: None.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).
separable_conv (bool) – Whether replace normal conv with depthwise separable conv which is faster. Default: False.

forward(x: torch.Tensor) → torch.Tensor[source]¶

Forward function for ASPP module.

Parameters: x (Tensor) – Input tensor with shape (N, C, H, W).
Returns: Output tensor.
Return type: Tensor

mmedit.models.base_archs.conv2d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1)[source]¶

mmedit.models.base_archs.conv_transpose2d(input, weight, bias=None, stride=1, padding=0, output_padding=0, groups=1, dilation=1)[source]¶

mmedit.models.base_archs.pixel_unshuffle(x: torch.Tensor, scale: int) → torch.Tensor[source]¶

Down-sample by pixel unshuffle.

Parameters

x (Tensor) – Input tensor.
scale (int) – Scale factor.

Returns

Output tensor.

Return type

Tensor

class mmedit.models.base_archs.SpatialTemporalEnsemble(is_temporal_ensemble: Optional[bool] = False)[source]¶

Bases: torch.nn.Module

Apply spatial and temporal ensemble and compute outputs.

Parameters: is_temporal_ensemble (bool, optional) – Whether to apply ensemble temporally. If True, the sequence will also be flipped temporally. If the input is an image, this argument must be set to False. Default: False.

_transform(imgs: torch.Tensor, mode: str) → torch.Tensor[source]¶

Apply spatial transform (flip, rotate) to the images.

Parameters

imgs (torch.Tensor) – The images to be transformed/
mode (str) – The mode of transform. Supported values are ‘vertical’, ‘horizontal’, and ‘transpose’, corresponding to vertical flip, horizontal flip, and rotation, respectively.

Returns

Output of the model with spatial ensemble applied.

Return type

torch.Tensor

spatial_ensemble(imgs: torch.Tensor, model: torch.nn.Module) → torch.Tensor[source]¶

Apply spatial ensemble.

Parameters

imgs (torch.Tensor) – The images to be processed by the model. Its size should be either (n, t, c, h, w) or (n, c, h, w).
model (nn.Module) – The model to process the images.

Returns

Output of the model with spatial ensemble applied.

Return type

torch.Tensor

forward(imgs: torch.Tensor, model: torch.nn.Module) → torch.Tensor[source]¶

Apply spatial and temporal ensemble.

Parameters

imgs (torch.Tensor) – The images to be processed by the model. Its size should be either (n, t, c, h, w) or (n, c, h, w).
model (nn.Module) – The model to process the images.

Returns

Output of the model with spatial ensemble applied.

Return type

torch.Tensor

class mmedit.models.base_archs.SimpleGatedConvModule(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int]], feat_act_cfg: Optional[dict] = dict(type='ELU'), gate_act_cfg: Optional[dict] = dict(type='Sigmoid'), **kwargs)[source]¶

Bases: torch.nn.Module

Simple Gated Convolutional Module.

This module is a simple gated convolutional module. The detailed formula is:

\[y = \phi(conv1(x)) * \sigma(conv2(x)),\]

where phi is the feature activation function and sigma is the gate activation function. In default, the gate activation function is sigmoid.

Parameters

in_channels (int) – Same as nn.Conv2d.
out_channels (int) – The number of channels of the output feature. Note that out_channels in the conv module is doubled since this module contains two convolutions for feature and gate separately.
kernel_size (int or tuple[int]) – Same as nn.Conv2d.
feat_act_cfg (dict) – Config dict for feature activation layer. Default: dict(type=’ELU’).
gate_act_cfg (dict) – Config dict for gate activation layer. Default: dict(type=’Sigmoid’).
kwargs (keyword arguments) – Same as ConvModule.

forward(x: torch.Tensor) → torch.Tensor[source]¶

Forward Function.

Parameters: x (torch.Tensor) – Input tensor with shape of (n, c, h, w).
Returns: Output tensor with shape of (n, c, h’, w’).
Return type: torch.Tensor

class mmedit.models.base_archs.ImgNormalize(pixel_range: float, img_mean: Tuple[float, float, float], img_std: Tuple[float, float, float], sign: int = - 1)[source]¶

Bases: torch.nn.Conv2d

Normalize images with the given mean and std value.

Based on Conv2d layer, can work in GPU.

Parameters

pixel_range (float) – Pixel range of feature.
img_mean (Tuple[float]) – Image mean of each channel.
img_std (Tuple[float]) – Image std of each channel.
sign (int) – Sign of bias. Default -1.

class mmedit.models.base_archs.LinearModule(in_features: int, out_features: int, bias: bool = True, act_cfg: Optional[dict] = dict(type='ReLU'), inplace: bool = True, with_spectral_norm: bool = False, order: Tuple[str, str] = ('linear', 'act'))[source]¶

Bases: torch.nn.Module

A linear block that contains linear/norm/activation layers.

For low level vision, we add spectral norm and padding layer.

Parameters

in_features (int) – Same as nn.Linear.
out_features (int) – Same as nn.Linear.
bias (bool) – Same as nn.Linear. Default: True.
act_cfg (dict) – Config dict for activation layer, “relu” by default.
inplace (bool) – Whether to use inplace mode for activation. Default: True.
with_spectral_norm (bool) – Whether use spectral norm in linear module. Default: False.
order (tuple[str]) – The order of linear/activation layers. It is a sequence of “linear”, “norm” and “act”. Examples are (“linear”, “act”) and (“act”, “linear”).

init_weights() → None[source]¶: Init weights for the model.

forward(x: torch.Tensor, activate: Optional[bool] = True) → torch.Tensor[source]¶

Forward Function.

Parameters

x (torch.Tensor) – Input tensor with shape of \((n, *, c)\). Same as torch.nn.Linear.
activate (bool, optional) – Whether to use activation layer. Defaults to True.

Returns

Same as torch.nn.Linear.

Return type

torch.Tensor

class mmedit.models.base_archs.MultiLayerDiscriminator(in_channels: int, max_channels: int, num_convs: int = 5, fc_in_channels: Optional[int] = None, fc_out_channels: int = 1024, kernel_size: int = 5, conv_cfg: Optional[dict] = None, norm_cfg: Optional[dict] = None, act_cfg: Optional[dict] = dict(type='ReLU'), out_act_cfg: Optional[dict] = dict(type='ReLU'), with_input_norm: bool = True, with_out_convs: bool = False, with_spectral_norm: bool = False, **kwargs)[source]¶

Bases: torch.nn.Module

Multilayer Discriminator.

This is a commonly used structure with stacked multiply convolution layers.

Parameters

in_channels (int) – Input channel of the first input convolution.
max_channels (int) – The maximum channel number in this structure.
num_conv (int) – Number of stacked intermediate convs (including input conv but excluding output conv). Default to 5.
fc_in_channels (int | None) – Input dimension of the fully connected layer. If fc_in_channels is None, the fully connected layer will be removed. Default to None.
fc_out_channels (int) – Output dimension of the fully connected layer. Default to 1024.
kernel_size (int) – Kernel size of the conv modules. Default to 5.
conv_cfg (dict) – Config dict to build conv layer.
norm_cfg (dict) – Config dict to build norm layer.
act_cfg (dict) – Config dict for activation layer, “relu” by default.
out_act_cfg (dict) – Config dict for output activation, “relu” by default.
with_input_norm (bool) – Whether add normalization after the input conv. Default to True.
with_out_convs (bool) – Whether add output convs to the discriminator. The output convs contain two convs. The first out conv has the same setting as the intermediate convs but a stride of 1 instead of 2. The second out conv is a conv similar to the first out conv but reduces the number of channels to 1 and has no activation layer. Default to False.
with_spectral_norm (bool) – Whether use spectral norm after the conv layers. Default to False.
kwargs (keyword arguments) –

forward(x: torch.Tensor) → torch.Tensor[source]¶

Forward Function.

Parameters: x (torch.Tensor) – Input tensor with shape of (n, c, h, w).
Returns: Output tensor with shape of (n, c, h’, w’) or (n, c).
Return type: torch.Tensor

init_weights(pretrained: Optional[str] = None) → None[source]¶

Init weights for models.

Parameters: pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.

class mmedit.models.base_archs.PatchDiscriminator(in_channels: int, base_channels: int = 64, num_conv: int = 3, norm_cfg: dict = dict(type='BN'), init_cfg: Optional[dict] = dict(type='normal', gain=0.02))[source]¶

Bases: torch.nn.Module

A PatchGAN discriminator.

Parameters

in_channels (int) – Number of channels in input images.
base_channels (int) – Number of channels at the first conv layer. Default: 64.
num_conv (int) – Number of stacked intermediate convs (excluding input and output conv). Default: 3.
norm_cfg (dict) – Config dict to build norm layer. Default: dict(type=’BN’).
init_cfg (dict) – Config dict for initialization. type: The name of our initialization method. Default: ‘normal’. gain: Scaling factor for normal, xavier and orthogonal. Default: 0.02.

forward(x: torch.Tensor) → torch.Tensor[source]¶

Forward function.

Parameters: x (Tensor) – Input tensor with shape (n, c, h, w).
Returns: Forward results.
Return type: Tensor

init_weights(pretrained: Optional[str] = None) → None[source]¶

Initialize weights for the model.

Parameters: pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Default: None.

class mmedit.models.base_archs.ResNet(depth: int, in_channels: int = 3, stem_channels: int = 64, base_channels: int = 64, num_stages: int = 4, strides: Sequence[int] = (1, 2, 2, 2), dilations: Sequence[int] = (1, 1, 2, 4), deep_stem: bool = False, avg_down: bool = False, frozen_stages: int = - 1, act_cfg: dict = dict(type='ReLU'), conv_cfg: Optional[dict] = None, norm_cfg: dict = dict(type='BN'), with_cp: bool = False, multi_grid: Optional[Sequence[int]] = None, contract_dilation: bool = False, zero_init_residual: bool = True)[source]¶

Bases: torch.nn.Module

General ResNet.

This class is adopted from https://github.com/open-mmlab/mmsegmentation/blob/master/mmseg/models/backbones/resnet.py.

Parameters

depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.
in_channels (int) – Number of input image channels. Default” 3.
stem_channels (int) – Number of stem channels. Default: 64.
base_channels (int) – Number of base channels of res layer. Default: 64.
num_stages (int) – Resnet stages, normally 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).
dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 2, 4).
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
act_cfg (dict) – Dictionary to construct and config activation layer. Default: dict(type=’ReLU’).
conv_cfg (dict) – Dictionary to construct and config convolution layer. Default: None.
norm_cfg (dict) – Dictionary to construct and config norm layer. Default: dict(type=’BN’).
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
multi_grid (Sequence[int]|None) – Multi grid dilation rates of last stage. Default: None.
contract_dilation (bool) – Whether contract first dilation of each layer Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.

property norm1: torch.nn.Module¶

normalization layer after the second convolution layer

Type: nn.Module

arch_settings¶

_make_stem_layer(in_channels: int, stem_channels: int) → None[source]¶: Make stem layer for ResNet.

_make_layer(block: BasicBlock, planes: int, blocks: int, stride: int = 1, dilation: int = 1) → torch.nn.Module[source]¶

_nostride_dilate(m: torch.nn.Module, dilate: int) → None[source]¶

init_weights(pretrained: Optional[str] = None) → None[source]¶

Init weights for the model.

Parameters: pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.

_freeze_stages() → None[source]¶: Freeze stages param and norm stats.

forward(x: torch.Tensor) → List[torch.Tensor][source]¶

Forward function.

Parameters: x (Tensor) – Input tensor with shape (N, C, H, W).
Returns: Output tensor.
Return type: Tensor

class mmedit.models.base_archs.DepthwiseSeparableConvModule(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int]], stride: Union[int, Tuple[int, int]] = 1, padding: Union[int, Tuple[int, int]] = 0, dilation: Union[int, Tuple[int, int]] = 1, norm_cfg: Optional[dict] = None, act_cfg: Optional[dict] = dict(type='ReLU'), dw_norm_cfg: Union[dict, str] = 'default', dw_act_cfg: Union[dict, str] = 'default', pw_norm_cfg: Union[dict, str] = 'default', pw_act_cfg: Union[dict, str] = 'default', **kwargs)[source]¶

Bases: torch.nn.Module

Depthwise separable convolution module.

See https://arxiv.org/pdf/1704.04861.pdf for details.

This module can replace a ConvModule with the conv block replaced by two conv block: depthwise conv block and pointwise conv block. The depthwise conv block contains depthwise-conv/norm/activation layers. The pointwise conv block contains pointwise-conv/norm/activation layers. It should be noted that there will be norm/activation layer in the depthwise conv block if norm_cfg and act_cfg are specified.

Parameters

in_channels (int) – Same as nn.Conv2d.
out_channels (int) – Same as nn.Conv2d.
kernel_size (int or tuple[int]) – Same as nn.Conv2d.
stride (int or tuple[int]) – Same as nn.Conv2d. Default: 1.
padding (int or tuple[int]) – Same as nn.Conv2d. Default: 0.
dilation (int or tuple[int]) – Same as nn.Conv2d. Default: 1.
norm_cfg (dict) – Default norm config for both depthwise ConvModule and pointwise ConvModule. Default: None.
act_cfg (dict) – Default activation config for both depthwise ConvModule and pointwise ConvModule. Default: dict(type=’ReLU’).
dw_norm_cfg (dict) – Norm config of depthwise ConvModule. If it is ‘default’, it will be the same as norm_cfg. Default: ‘default’.
dw_act_cfg (dict) – Activation config of depthwise ConvModule. If it is ‘default’, it will be the same as act_cfg. Default: ‘default’.
pw_norm_cfg (dict) – Norm config of pointwise ConvModule. If it is ‘default’, it will be the same as norm_cfg. Default: ‘default’.
pw_act_cfg (dict) – Activation config of pointwise ConvModule. If it is ‘default’, it will be the same as act_cfg. Default: ‘default’.
kwargs (optional) – Other shared arguments for depthwise and pointwise ConvModule. See ConvModule for ref.

forward(x: torch.Tensor) → torch.Tensor[source]¶

Forward function.

Parameters: x (Tensor) – Input tensor with shape (N, C, H, W).
Returns: Output tensor.
Return type: Tensor

class mmedit.models.base_archs.SimpleEncoderDecoder(encoder: dict, decoder: dict, init_cfg: Optional[dict] = None)[source]¶

Bases: mmengine.model.BaseModule

Simple encoder-decoder model from matting.

Parameters

encoder (dict) – Config of the encoder.
decoder (dict) – Config of the decoder.
init_cfg (dict, optional) – Initialization config dict.

forward(*args, **kwargs) → torch.Tensor[source]¶

Forward function.

Returns: The output tensor of the decoder.
Return type: Tensor

class mmedit.models.base_archs.SoftMaskPatchDiscriminator(in_channels: int, base_channels: Optional[int] = 64, num_conv: Optional[int] = 3, norm_cfg: Optional[dict] = None, init_cfg: Optional[dict] = dict(type='normal', gain=0.02), with_spectral_norm: Optional[bool] = False)[source]¶

Bases: mmengine.model.BaseModule

A Soft Mask-Guided PatchGAN discriminator.

Parameters

in_channels (int) – Number of channels in input images.
base_channels (int, optional) – Number of channels at the first conv layer. Default: 64.
num_conv (int, optional) – Number of stacked intermediate convs (excluding input and output conv). Default: 3.
norm_cfg (dict, optional) – Config dict to build norm layer. Default: None.
init_cfg (dict, optional) – Config dict for initialization. type: The name of our initialization method. Default: ‘normal’. gain: Scaling factor for normal, xavier and orthogonal. Default: 0.02.
with_spectral_norm (bool, optional) – Whether use spectral norm after the conv layers. Default: False.

forward(x: torch.Tensor) → torch.Tensor[source]¶

Forward function.

Parameters: x (Tensor) – Input tensor with shape (n, c, h, w).
Returns: Forward results.
Return type: Tensor

init_weights() → None[source]¶: Initialize weights for the model.

class mmedit.models.base_archs.ResidualBlockNoBN(mid_channels: int = 64, res_scale: float = 1.0)[source]¶

Bases: torch.nn.Module

Residual block without BN.

It has a style of:

---Conv-ReLU-Conv-+-
 |________________|

Parameters

mid_channels (int) – Channel number of intermediate features. Default: 64.
res_scale (float) – Used to scale the residual before addition. Default: 1.0.

init_weights() → None[source]¶

Initialize weights for ResidualBlockNoBN.

Initialization methods like kaiming_init are for VGG-style modules. For modules with residual paths, using smaller std is better for stability and performance. We empirically use 0.1. See more details in “ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks”

forward(x: torch.Tensor) → torch.Tensor[source]¶

Forward function.

Parameters: x (Tensor) – Input tensor with shape (n, c, h, w).
Returns: Forward results.
Return type: Tensor

class mmedit.models.base_archs.PixelShufflePack(in_channels: int, out_channels: int, scale_factor: int, upsample_kernel: int)[source]¶

Bases: torch.nn.Module

Pixel Shuffle upsample layer.

Parameters

in_channels (int) – Number of input channels.
out_channels (int) – Number of output channels.
scale_factor (int) – Upsample ratio.
upsample_kernel (int) – Kernel size of Conv layer to expand channels.

Returns

Upsampled feature map.

init_weights() → None[source]¶: Initialize weights for PixelShufflePack.

forward(x: torch.Tensor) → torch.Tensor[source]¶

Forward function for PixelShufflePack.

Parameters: x (Tensor) – Input tensor with shape (n, c, h, w).
Returns: Forward results.
Return type: Tensor

class mmedit.models.base_archs.VGG16(in_channels: int, batch_norm: Optional[bool] = False, aspp: Optional[bool] = False, dilations: Optional[List[int]] = None, init_cfg: Optional[dict] = None)[source]¶

Bases: mmengine.model.BaseModule

Customized VGG16 Encoder.

A 1x1 conv is added after the original VGG16 conv layers. The indices of max pooling layers are returned for unpooling layers in decoders.

Parameters

in_channels (int) – Number of input channels.
batch_norm (bool, optional) – Whether use nn.BatchNorm2d. Default to False.
aspp (bool, optional) – Whether use ASPP module after the last conv layer. Default to False.
dilations (list[int], optional) – Atrous rates of ASPP module. Default to None.
init_cfg (dict, optional) – Initialization config dict.

_make_layer(inplanes: int, planes: int, convs_layers: int) → torch.nn.Module[source]¶

init_weights() → None[source]¶: Init weights for the model.

forward(x: torch.Tensor) → Dict[str, torch.Tensor][source]¶

Forward function for ASPP module.

Parameters: x (Tensor) – Input tensor with shape (N, C, H, W).
Returns: Dict containing output tensor and maxpooling indices.
Return type: dict

mmedit.models.base_archs¶

Package Contents¶

Classes¶

Functions¶

`mmedit.models.base_archs`¶