mmedit.models.base_archs
¶
Package Contents¶
Classes¶
All gather layer with backward propagation path. |
|
ASPP module from DeepLabV3. |
|
Apply spatial and temporal ensemble and compute outputs. |
|
Simple Gated Convolutional Module. |
|
Normalize images with the given mean and std value. |
|
A linear block that contains linear/norm/activation layers. |
|
Multilayer Discriminator. |
|
A PatchGAN discriminator. |
|
General ResNet. |
|
Depthwise separable convolution module. |
|
Simple encoder-decoder model from matting. |
|
A Soft Mask-Guided PatchGAN discriminator. |
|
Residual block without BN. |
|
Pixel Shuffle upsample layer. |
|
Customized VGG16 Encoder. |
Functions¶
|
|
|
|
|
Down-sample by pixel unshuffle. |
- class mmedit.models.base_archs.AllGatherLayer(*args, **kwargs)[source]¶
Bases:
torch.autograd.Function
All gather layer with backward propagation path.
Indeed, this module is to make
dist.all_gather()
in the backward graph. Such kind of operation has been widely used in Moco and other contrastive learning algorithms.
- class mmedit.models.base_archs.ASPP(in_channels: int, out_channels: int = 256, mid_channels: int = 256, dilations: Sequence[int] = (12, 24, 36), conv_cfg: Optional[dict] = None, norm_cfg: Optional[dict] = dict(type='BN'), act_cfg: Optional[dict] = dict(type='ReLU'), separable_conv: bool = False)[source]¶
Bases:
torch.nn.Module
ASPP module from DeepLabV3.
The code is adopted from https://github.com/pytorch/vision/blob/master/torchvision/models/ segmentation/deeplabv3.py
For more information about the module: “Rethinking Atrous Convolution for Semantic Image Segmentation”.
- Parameters
in_channels (int) – Input channels of the module.
out_channels (int) – Output channels of the module. Default: 256.
mid_channels (int) – Output channels of the intermediate ASPP conv modules. Default: 256.
dilations (Sequence[int]) – Dilation rate of three ASPP conv module. Default: [12, 24, 36].
conv_cfg (dict) – Config dict for convolution layer. If “None”, nn.Conv2d will be applied. Default: None.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).
separable_conv (bool) – Whether replace normal conv with depthwise separable conv which is faster. Default: False.
- mmedit.models.base_archs.conv2d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1)[source]¶
- mmedit.models.base_archs.conv_transpose2d(input, weight, bias=None, stride=1, padding=0, output_padding=0, groups=1, dilation=1)[source]¶
- mmedit.models.base_archs.pixel_unshuffle(x: torch.Tensor, scale: int) torch.Tensor [source]¶
Down-sample by pixel unshuffle.
- Parameters
x (Tensor) – Input tensor.
scale (int) – Scale factor.
- Returns
Output tensor.
- Return type
Tensor
- class mmedit.models.base_archs.SpatialTemporalEnsemble(is_temporal_ensemble: Optional[bool] = False)[source]¶
Bases:
torch.nn.Module
Apply spatial and temporal ensemble and compute outputs.
- Parameters
is_temporal_ensemble (bool, optional) – Whether to apply ensemble temporally. If True, the sequence will also be flipped temporally. If the input is an image, this argument must be set to False. Default: False.
- _transform(imgs: torch.Tensor, mode: str) torch.Tensor [source]¶
Apply spatial transform (flip, rotate) to the images.
- Parameters
imgs (torch.Tensor) – The images to be transformed/
mode (str) – The mode of transform. Supported values are ‘vertical’, ‘horizontal’, and ‘transpose’, corresponding to vertical flip, horizontal flip, and rotation, respectively.
- Returns
Output of the model with spatial ensemble applied.
- Return type
torch.Tensor
- spatial_ensemble(imgs: torch.Tensor, model: torch.nn.Module) torch.Tensor [source]¶
Apply spatial ensemble.
- Parameters
imgs (torch.Tensor) – The images to be processed by the model. Its size should be either (n, t, c, h, w) or (n, c, h, w).
model (nn.Module) – The model to process the images.
- Returns
Output of the model with spatial ensemble applied.
- Return type
torch.Tensor
- forward(imgs: torch.Tensor, model: torch.nn.Module) torch.Tensor [source]¶
Apply spatial and temporal ensemble.
- Parameters
imgs (torch.Tensor) – The images to be processed by the model. Its size should be either (n, t, c, h, w) or (n, c, h, w).
model (nn.Module) – The model to process the images.
- Returns
Output of the model with spatial ensemble applied.
- Return type
torch.Tensor
- class mmedit.models.base_archs.SimpleGatedConvModule(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int]], feat_act_cfg: Optional[dict] = dict(type='ELU'), gate_act_cfg: Optional[dict] = dict(type='Sigmoid'), **kwargs)[source]¶
Bases:
torch.nn.Module
Simple Gated Convolutional Module.
This module is a simple gated convolutional module. The detailed formula is:
\[y = \phi(conv1(x)) * \sigma(conv2(x)),\]where phi is the feature activation function and sigma is the gate activation function. In default, the gate activation function is sigmoid.
- Parameters
in_channels (int) – Same as nn.Conv2d.
out_channels (int) – The number of channels of the output feature. Note that out_channels in the conv module is doubled since this module contains two convolutions for feature and gate separately.
kernel_size (int or tuple[int]) – Same as nn.Conv2d.
feat_act_cfg (dict) – Config dict for feature activation layer. Default: dict(type=’ELU’).
gate_act_cfg (dict) – Config dict for gate activation layer. Default: dict(type=’Sigmoid’).
kwargs (keyword arguments) – Same as ConvModule.
- class mmedit.models.base_archs.ImgNormalize(pixel_range: float, img_mean: Tuple[float, float, float], img_std: Tuple[float, float, float], sign: int = - 1)[source]¶
Bases:
torch.nn.Conv2d
Normalize images with the given mean and std value.
Based on Conv2d layer, can work in GPU.
- Parameters
pixel_range (float) – Pixel range of feature.
img_mean (Tuple[float]) – Image mean of each channel.
img_std (Tuple[float]) – Image std of each channel.
sign (int) – Sign of bias. Default -1.
- class mmedit.models.base_archs.LinearModule(in_features: int, out_features: int, bias: bool = True, act_cfg: Optional[dict] = dict(type='ReLU'), inplace: bool = True, with_spectral_norm: bool = False, order: Tuple[str, str] = ('linear', 'act'))[source]¶
Bases:
torch.nn.Module
A linear block that contains linear/norm/activation layers.
For low level vision, we add spectral norm and padding layer.
- Parameters
in_features (int) – Same as nn.Linear.
out_features (int) – Same as nn.Linear.
bias (bool) – Same as nn.Linear. Default: True.
act_cfg (dict) – Config dict for activation layer, “relu” by default.
inplace (bool) – Whether to use inplace mode for activation. Default: True.
with_spectral_norm (bool) – Whether use spectral norm in linear module. Default: False.
order (tuple[str]) – The order of linear/activation layers. It is a sequence of “linear”, “norm” and “act”. Examples are (“linear”, “act”) and (“act”, “linear”).
- forward(x: torch.Tensor, activate: Optional[bool] = True) torch.Tensor [source]¶
Forward Function.
- Parameters
x (torch.Tensor) – Input tensor with shape of \((n, *, c)\). Same as
torch.nn.Linear
.activate (bool, optional) – Whether to use activation layer. Defaults to True.
- Returns
Same as
torch.nn.Linear
.- Return type
torch.Tensor
- class mmedit.models.base_archs.MultiLayerDiscriminator(in_channels: int, max_channels: int, num_convs: int = 5, fc_in_channels: Optional[int] = None, fc_out_channels: int = 1024, kernel_size: int = 5, conv_cfg: Optional[dict] = None, norm_cfg: Optional[dict] = None, act_cfg: Optional[dict] = dict(type='ReLU'), out_act_cfg: Optional[dict] = dict(type='ReLU'), with_input_norm: bool = True, with_out_convs: bool = False, with_spectral_norm: bool = False, **kwargs)[source]¶
Bases:
torch.nn.Module
Multilayer Discriminator.
This is a commonly used structure with stacked multiply convolution layers.
- Parameters
in_channels (int) – Input channel of the first input convolution.
max_channels (int) – The maximum channel number in this structure.
num_conv (int) – Number of stacked intermediate convs (including input conv but excluding output conv). Default to 5.
fc_in_channels (int | None) – Input dimension of the fully connected layer. If fc_in_channels is None, the fully connected layer will be removed. Default to None.
fc_out_channels (int) – Output dimension of the fully connected layer. Default to 1024.
kernel_size (int) – Kernel size of the conv modules. Default to 5.
conv_cfg (dict) – Config dict to build conv layer.
norm_cfg (dict) – Config dict to build norm layer.
act_cfg (dict) – Config dict for activation layer, “relu” by default.
out_act_cfg (dict) – Config dict for output activation, “relu” by default.
with_input_norm (bool) – Whether add normalization after the input conv. Default to True.
with_out_convs (bool) – Whether add output convs to the discriminator. The output convs contain two convs. The first out conv has the same setting as the intermediate convs but a stride of 1 instead of 2. The second out conv is a conv similar to the first out conv but reduces the number of channels to 1 and has no activation layer. Default to False.
with_spectral_norm (bool) – Whether use spectral norm after the conv layers. Default to False.
kwargs (keyword arguments) –
- class mmedit.models.base_archs.PatchDiscriminator(in_channels: int, base_channels: int = 64, num_conv: int = 3, norm_cfg: dict = dict(type='BN'), init_cfg: Optional[dict] = dict(type='normal', gain=0.02))[source]¶
Bases:
torch.nn.Module
A PatchGAN discriminator.
- Parameters
in_channels (int) – Number of channels in input images.
base_channels (int) – Number of channels at the first conv layer. Default: 64.
num_conv (int) – Number of stacked intermediate convs (excluding input and output conv). Default: 3.
norm_cfg (dict) – Config dict to build norm layer. Default: dict(type=’BN’).
init_cfg (dict) – Config dict for initialization. type: The name of our initialization method. Default: ‘normal’. gain: Scaling factor for normal, xavier and orthogonal. Default: 0.02.
- class mmedit.models.base_archs.ResNet(depth: int, in_channels: int = 3, stem_channels: int = 64, base_channels: int = 64, num_stages: int = 4, strides: Sequence[int] = (1, 2, 2, 2), dilations: Sequence[int] = (1, 1, 2, 4), deep_stem: bool = False, avg_down: bool = False, frozen_stages: int = - 1, act_cfg: dict = dict(type='ReLU'), conv_cfg: Optional[dict] = None, norm_cfg: dict = dict(type='BN'), with_cp: bool = False, multi_grid: Optional[Sequence[int]] = None, contract_dilation: bool = False, zero_init_residual: bool = True)[source]¶
Bases:
torch.nn.Module
General ResNet.
This class is adopted from https://github.com/open-mmlab/mmsegmentation/blob/master/mmseg/models/backbones/resnet.py.
- Parameters
depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.
in_channels (int) – Number of input image channels. Default” 3.
stem_channels (int) – Number of stem channels. Default: 64.
base_channels (int) – Number of base channels of res layer. Default: 64.
num_stages (int) – Resnet stages, normally 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).
dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 2, 4).
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
act_cfg (dict) – Dictionary to construct and config activation layer. Default: dict(type=’ReLU’).
conv_cfg (dict) – Dictionary to construct and config convolution layer. Default: None.
norm_cfg (dict) – Dictionary to construct and config norm layer. Default: dict(type=’BN’).
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
multi_grid (Sequence[int]|None) – Multi grid dilation rates of last stage. Default: None.
contract_dilation (bool) – Whether contract first dilation of each layer Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
- property norm1: torch.nn.Module¶
normalization layer after the second convolution layer
- Type
nn.Module
- arch_settings¶
- _make_layer(block: BasicBlock, planes: int, blocks: int, stride: int = 1, dilation: int = 1) torch.nn.Module [source]¶
- class mmedit.models.base_archs.DepthwiseSeparableConvModule(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int]], stride: Union[int, Tuple[int, int]] = 1, padding: Union[int, Tuple[int, int]] = 0, dilation: Union[int, Tuple[int, int]] = 1, norm_cfg: Optional[dict] = None, act_cfg: Optional[dict] = dict(type='ReLU'), dw_norm_cfg: Union[dict, str] = 'default', dw_act_cfg: Union[dict, str] = 'default', pw_norm_cfg: Union[dict, str] = 'default', pw_act_cfg: Union[dict, str] = 'default', **kwargs)[source]¶
Bases:
torch.nn.Module
Depthwise separable convolution module.
See https://arxiv.org/pdf/1704.04861.pdf for details.
This module can replace a ConvModule with the conv block replaced by two conv block: depthwise conv block and pointwise conv block. The depthwise conv block contains depthwise-conv/norm/activation layers. The pointwise conv block contains pointwise-conv/norm/activation layers. It should be noted that there will be norm/activation layer in the depthwise conv block if
norm_cfg
andact_cfg
are specified.- Parameters
in_channels (int) – Same as nn.Conv2d.
out_channels (int) – Same as nn.Conv2d.
kernel_size (int or tuple[int]) – Same as nn.Conv2d.
stride (int or tuple[int]) – Same as nn.Conv2d. Default: 1.
padding (int or tuple[int]) – Same as nn.Conv2d. Default: 0.
dilation (int or tuple[int]) – Same as nn.Conv2d. Default: 1.
norm_cfg (dict) – Default norm config for both depthwise ConvModule and pointwise ConvModule. Default: None.
act_cfg (dict) – Default activation config for both depthwise ConvModule and pointwise ConvModule. Default: dict(type=’ReLU’).
dw_norm_cfg (dict) – Norm config of depthwise ConvModule. If it is ‘default’, it will be the same as
norm_cfg
. Default: ‘default’.dw_act_cfg (dict) – Activation config of depthwise ConvModule. If it is ‘default’, it will be the same as
act_cfg
. Default: ‘default’.pw_norm_cfg (dict) – Norm config of pointwise ConvModule. If it is ‘default’, it will be the same as norm_cfg. Default: ‘default’.
pw_act_cfg (dict) – Activation config of pointwise ConvModule. If it is ‘default’, it will be the same as
act_cfg
. Default: ‘default’.kwargs (optional) – Other shared arguments for depthwise and pointwise ConvModule. See ConvModule for ref.
- class mmedit.models.base_archs.SimpleEncoderDecoder(encoder: dict, decoder: dict, init_cfg: Optional[dict] = None)[source]¶
Bases:
mmengine.model.BaseModule
Simple encoder-decoder model from matting.
- Parameters
encoder (dict) – Config of the encoder.
decoder (dict) – Config of the decoder.
init_cfg (dict, optional) – Initialization config dict.
- class mmedit.models.base_archs.SoftMaskPatchDiscriminator(in_channels: int, base_channels: Optional[int] = 64, num_conv: Optional[int] = 3, norm_cfg: Optional[dict] = None, init_cfg: Optional[dict] = dict(type='normal', gain=0.02), with_spectral_norm: Optional[bool] = False)[source]¶
Bases:
mmengine.model.BaseModule
A Soft Mask-Guided PatchGAN discriminator.
- Parameters
in_channels (int) – Number of channels in input images.
base_channels (int, optional) – Number of channels at the first conv layer. Default: 64.
num_conv (int, optional) – Number of stacked intermediate convs (excluding input and output conv). Default: 3.
norm_cfg (dict, optional) – Config dict to build norm layer. Default: None.
init_cfg (dict, optional) – Config dict for initialization. type: The name of our initialization method. Default: ‘normal’. gain: Scaling factor for normal, xavier and orthogonal. Default: 0.02.
with_spectral_norm (bool, optional) – Whether use spectral norm after the conv layers. Default: False.
- class mmedit.models.base_archs.ResidualBlockNoBN(mid_channels: int = 64, res_scale: float = 1.0)[source]¶
Bases:
torch.nn.Module
Residual block without BN.
It has a style of:
---Conv-ReLU-Conv-+- |________________|
- Parameters
mid_channels (int) – Channel number of intermediate features. Default: 64.
res_scale (float) – Used to scale the residual before addition. Default: 1.0.
- init_weights() None [source]¶
Initialize weights for ResidualBlockNoBN.
Initialization methods like kaiming_init are for VGG-style modules. For modules with residual paths, using smaller std is better for stability and performance. We empirically use 0.1. See more details in “ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks”
- class mmedit.models.base_archs.PixelShufflePack(in_channels: int, out_channels: int, scale_factor: int, upsample_kernel: int)[source]¶
Bases:
torch.nn.Module
Pixel Shuffle upsample layer.
- Parameters
in_channels (int) – Number of input channels.
out_channels (int) – Number of output channels.
scale_factor (int) – Upsample ratio.
upsample_kernel (int) – Kernel size of Conv layer to expand channels.
- Returns
Upsampled feature map.
- class mmedit.models.base_archs.VGG16(in_channels: int, batch_norm: Optional[bool] = False, aspp: Optional[bool] = False, dilations: Optional[List[int]] = None, init_cfg: Optional[dict] = None)[source]¶
Bases:
mmengine.model.BaseModule
Customized VGG16 Encoder.
A 1x1 conv is added after the original VGG16 conv layers. The indices of max pooling layers are returned for unpooling layers in decoders.
- Parameters
in_channels (int) – Number of input channels.
batch_norm (bool, optional) – Whether use
nn.BatchNorm2d
. Default to False.aspp (bool, optional) – Whether use ASPP module after the last conv layer. Default to False.
dilations (list[int], optional) – Atrous rates of ASPP module. Default to None.
init_cfg (dict, optional) – Initialization config dict.