Shortcuts

mmedit.models.editors.mspie

Package Contents

Classes

MSPIEStyleGAN2

MS-PIE StyleGAN2.

MSStyleGAN2Discriminator

StyleGAN2 Discriminator.

MSStyleGANv2Generator

StyleGAN2 Generator.

PESinGAN

Positional Encoding in SinGAN.

SinGANMSGeneratorPE

Multi-Scale Generator used in SinGAN with positional encoding.

CatersianGrid

Catersian Grid for 2d tensor.

SinusoidalPositionalEmbedding

Sinusoidal Positional Embedding 1D or 2D (SPE/SPE2d).

class mmedit.models.editors.mspie.MSPIEStyleGAN2(*args, train_settings=dict(), **kwargs)[源代码]

Bases: mmedit.models.editors.stylegan2.StyleGAN2

MS-PIE StyleGAN2.

In this GAN, we adopt the MS-PIE training schedule so that multi-scale images can be generated with a single generator. Details can be found in: Positional Encoding as Spatial Inductive Bias in GANs, CVPR2021.

参数

train_settings (dict) – Config for training settings. Defaults to dict().

train_step(data: dict, optim_wrapper: mmengine.optim.OptimWrapperDict) Dict[str, torch.Tensor]

Train GAN model. In the training of GAN models, generator and discriminator are updated alternatively. In MMGeneration’s design, self.train_step is called with data input. Therefore we always update discriminator, whose updating is relay on real data, and then determine if the generator needs to be updated based on the current number of iterations. More details about whether to update generator can be found in should_gen_update().

参数
  • data (dict) – Data sampled from dataloader.

  • optim_wrapper (OptimWrapperDict) – OptimWrapperDict instance contains OptimWrapper of generator and discriminator.

返回

A dict of tensor for logging.

返回类型

Dict[str, torch.Tensor]

train_generator(inputs: dict, data_samples: List[mmedit.structures.EditDataSample], optimizer_wrapper: mmengine.optim.OptimWrapper) Dict[str, torch.Tensor]

Train generator.

参数
  • inputs (TrainInput) – Inputs from dataloader.

  • data_samples (List[EditDataSample]) – Data samples from dataloader. Do not used in generator’s training.

  • optim_wrapper (OptimWrapper) – OptimWrapper instance used to update model parameters.

返回

A dict of tensor for logging.

返回类型

Dict[str, Tensor]

train_discriminator(inputs: dict, data_samples: List[mmedit.structures.EditDataSample], optimizer_wrapper: mmengine.optim.OptimWrapper) Dict[str, torch.Tensor]

Train discriminator.

参数
  • inputs (TrainInput) – Inputs from dataloader.

  • data_samples (List[EditDataSample]) – Data samples from dataloader.

  • optim_wrapper (OptimWrapper) – OptimWrapper instance used to update model parameters.

返回

A dict of tensor for logging.

返回类型

Dict[str, Tensor]

class mmedit.models.editors.mspie.MSStyleGAN2Discriminator(in_size, channel_multiplier=2, blur_kernel=[1, 3, 3, 1], mbstd_cfg=dict(group_size=4, channel_groups=1), with_adaptive_pool=False, pool_size=(2, 2))[源代码]

Bases: torch.nn.Module

StyleGAN2 Discriminator.

The architecture of this discriminator is proposed in StyleGAN2. More details can be found in: Analyzing and Improving the Image Quality of StyleGAN CVPR2020.

参数
  • in_size (int) – The input size of images.

  • channel_multiplier (int, optional) – The multiplier factor for the channel number. Defaults to 2.

  • blur_kernel (list, optional) – The blurry kernel. Defaults to [1, 3, 3, 1].

  • mbstd_cfg (dict, optional) – Configs for minibatch-stddev layer. Defaults to dict(group_size=4, channel_groups=1).

forward(x)

Forward function.

参数

x (torch.Tensor) – Input image tensor.

返回

Predict score for the input image.

返回类型

torch.Tensor

class mmedit.models.editors.mspie.MSStyleGANv2Generator(out_size, style_channels, num_mlps=8, channel_multiplier=2, blur_kernel=[1, 3, 3, 1], lr_mlp=0.01, default_style_mode='mix', eval_style_mode='single', mix_prob=0.9, no_pad=False, deconv2conv=False, interp_pad=None, up_config=dict(scale_factor=2, mode='nearest'), up_after_conv=False, head_pos_encoding=None, head_pos_size=(4, 4), interp_head=False)[源代码]

Bases: torch.nn.Module

StyleGAN2 Generator.

In StyleGAN2, we use a static architecture composing of a style mapping module and number of convolutional style blocks. More details can be found in: Analyzing and Improving the Image Quality of StyleGAN CVPR2020.

参数
  • out_size (int) – The output size of the StyleGAN2 generator.

  • style_channels (int) – The number of channels for style code.

  • num_mlps (int, optional) – The number of MLP layers. Defaults to 8.

  • channel_multiplier (int, optional) – The multiplier factor for the channel number. Defaults to 2.

  • blur_kernel (list, optional) – The blurry kernel. Defaults to [1, 3, 3, 1].

  • lr_mlp (float, optional) – The learning rate for the style mapping layer. Defaults to 0.01.

  • default_style_mode (str, optional) – The default mode of style mixing. In training, we defaultly adopt mixing style mode. However, in the evaluation, we use ‘single’ style mode. [‘mix’, ‘single’] are currently supported. Defaults to ‘mix’.

  • eval_style_mode (str, optional) – The evaluation mode of style mixing. Defaults to ‘single’.

  • mix_prob (float, optional) – Mixing probability. The value should be in range of [0, 1]. Defaults to 0.9.

train(mode=True)

Set train/eval mode.

参数

mode (bool, optional) – Whether set train mode. Defaults to True.

make_injected_noise(chosen_scale=0)

make noises that will be injected into feature maps.

参数

chosen_scale (int, optional) – Chosen scale. Defaults to 0.

返回

List of layer-wise noise tensor.

返回类型

list[Tensor]

get_mean_latent(num_samples=4096, **kwargs)

Get mean latent of W space in this generator.

参数

num_samples (int, optional) – Number of sample times. Defaults to 4096.

返回

Mean latent of this generator.

返回类型

Tensor

style_mixing(n_source, n_target, inject_index=1, truncation_latent=None, truncation=0.7, chosen_scale=0)

Generating style mixing images.

参数
  • n_source (int) – Number of source images.

  • n_target (int) – Number of target images.

  • inject_index (int, optional) – Index from which replace with source latent. Defaults to 1.

  • truncation_latent (torch.Tensor, optional) – Mean truncation latent. Defaults to None.

  • truncation (float, optional) – Truncation factor. Give value less than 1., the truncation trick will be adopted. Defaults to 1.

  • curr_scale (int) – Current image scale. Defaults to -1.

  • transition_weight (float, optional) – The weight used in resolution transition. Defaults to 1.0.

  • chosen_scale (int, optional) – Chosen scale. Defaults to 0.

返回

Table of style-mixing images.

返回类型

torch.Tensor

forward(styles, num_batches=- 1, return_noise=False, return_latents=False, inject_index=None, truncation=1, truncation_latent=None, input_is_latent=False, injected_noise=None, randomize_noise=True, chosen_scale=0)

Forward function.

This function has been integrated with the truncation trick. Please refer to the usage of truncation and truncation_latent.

参数
  • styles (torch.Tensor | list[torch.Tensor] | callable | None) – In StyleGAN2, you can provide noise tensor or latent tensor. Given a list containing more than one noise or latent tensors, style mixing trick will be used in training. Of course, You can directly give a batch of noise through a torch.Tensor or offer a callable function to sample a batch of noise data. Otherwise, the None indicates to use the default noise sampler.

  • num_batches (int, optional) – The number of batch size. Defaults to 0.

  • return_noise (bool, optional) – If True, noise_batch will be returned in a dict with fake_img. Defaults to False.

  • return_latents (bool, optional) – If True, latent will be returned in a dict with fake_img. Defaults to False.

  • inject_index (int | None, optional) – The index number for mixing style codes. Defaults to None.

  • truncation (float, optional) – Truncation factor. Give value less than 1., the truncation trick will be adopted. Defaults to 1.

  • truncation_latent (torch.Tensor, optional) – Mean truncation latent. Defaults to None.

  • input_is_latent (bool, optional) – If True, the input tensor is the latent tensor. Defaults to False.

  • injected_noise (torch.Tensor | None, optional) – Given a tensor, the random noise will be fixed as this input injected noise. Defaults to None.

  • randomize_noise (bool, optional) – If False, images are sampled with the buffered noise tensor injected to the style conv block. Defaults to True.

返回

Generated image tensor or dictionary containing more data.

返回类型

torch.Tensor | dict

class mmedit.models.editors.mspie.PESinGAN(generator: ModelType, discriminator: Optional[ModelType], data_preprocessor: Optional[Union[dict, mmengine.Config]] = None, generator_steps: int = 1, discriminator_steps: int = 1, num_scales: Optional[int] = None, fixed_noise_with_pad: bool = False, first_fixed_noises_ch: int = 1, iters_per_scale: int = 200, noise_weight_init: int = 0.1, lr_scheduler_args: Optional[dict] = None, test_pkl_data: Optional[str] = None, ema_confg: Optional[dict] = None)[源代码]

Bases: mmedit.models.editors.singan.SinGAN

Positional Encoding in SinGAN.

This modified SinGAN is used to reimplement the experiments in: Positional Encoding as Spatial Inductive Bias in GANs, CVPR2021.

construct_fixed_noises()

Construct the fixed noises list used in SinGAN.

class mmedit.models.editors.mspie.SinGANMSGeneratorPE(in_channels, out_channels, num_scales, kernel_size=3, padding=0, num_layers=5, base_channels=32, min_feat_channels=32, out_act_cfg=dict(type='Tanh'), padding_mode='zero', pad_at_head=True, interp_pad=False, noise_with_pad=False, positional_encoding=None, first_stage_in_channels=None, **kwargs)[源代码]

Bases: mmedit.models.editors.singan.singan_generator.SinGANMultiScaleGenerator

Multi-Scale Generator used in SinGAN with positional encoding.

More details can be found in: Positional Encoding as Spatial Inductvie Bias in GANs, CVPR’2021.

Notes:

  • In this version, we adopt the interpolation function from the official PyTorch APIs, which is different from the original implementation by the authors. However, in our experiments, this influence can be ignored.

参数
  • in_channels (int) – Input channels.

  • out_channels (int) – Output channels.

  • num_scales (int) – The number of scales/stages in generator. Note that this number is counted from zero, which is the same as the original paper.

  • kernel_size (int, optional) – Kernel size, same as nn.Conv2d. Defaults to 3.

  • padding (int, optional) – Padding for the convolutional layer, same as nn.Conv2d. Defaults to 0.

  • num_layers (int, optional) – The number of convolutional layers in each generator block. Defaults to 5.

  • base_channels (int, optional) – The basic channels for convolutional layers in the generator block. Defaults to 32.

  • min_feat_channels (int, optional) – Minimum channels for the feature maps in the generator block. Defaults to 32.

  • out_act_cfg (dict | None, optional) – Configs for output activation layer. Defaults to dict(type=’Tanh’).

  • padding_mode (str, optional) – The mode of convolutional padding, same as nn.Conv2d. Defaults to ‘zero’.

  • pad_at_head (bool, optional) – Whether to add padding at head. Defaults to True.

  • interp_pad (bool, optional) – The padding value of interpolating feature maps. Defaults to False.

  • noise_with_pad (bool, optional) – Whether the input fixed noises are with explicit padding. Defaults to False.

  • positional_encoding (dict | None, optional) – Configs for the positional encoding. Defaults to None.

  • first_stage_in_channels (int | None, optional) – The input channel of the first generator block. If None, the first stage will adopt the same input channels as other stages. Defaults to None.

forward(input_sample, fixed_noises, noise_weights, rand_mode, curr_scale, num_batches=1, get_prev_res=False, return_noise=False)

Forward function.

参数
  • input_sample (Tensor | None) – The input for generator. In the original implementation, a tensor filled with zeros is adopted. If None is given, we will construct it from the first fixed noises.

  • fixed_noises (list[Tensor]) – List of the fixed noises in SinGAN.

  • noise_weights (list[float]) – List of the weights for random noises.

  • rand_mode (str) – Choices from [‘rand’, ‘recon’]. In rand mode, it will sample from random noises. Otherwise, the reconstruction for the single image will be returned.

  • curr_scale (int) – The scale for the current inference or training.

  • num_batches (int, optional) – The number of batches. Defaults to 1.

  • get_prev_res (bool, optional) – Whether to return results from previous stages. Defaults to False.

  • return_noise (bool, optional) – Whether to return noises tensor. Defaults to False.

返回

Generated image tensor or dictionary containing more data.

返回类型

Tensor | dict

class mmedit.models.editors.mspie.CatersianGrid[源代码]

Bases: torch.nn.Module

Catersian Grid for 2d tensor.

The Catersian Grid is a common-used positional encoding in deep learning. In this implementation, we follow the convention of grid_sample in PyTorch. In other words, [-1, -1] denotes the left-top corner while [1, 1] denotes the right-botton corner.

forward(x, **kwargs)
make_grid2d(height, width, num_batches=1, requires_grad=False)
make_grid2d_like(x, requires_grad=False)

Input tensor with shape of (b, …, h, w) Return tensor with shape of (b, 2 x emb_dim, h, w)

Note that the positional embedding highly depends on the the function, make_grid2d.

class mmedit.models.editors.mspie.SinusoidalPositionalEmbedding(embedding_dim, padding_idx, init_size=1024, div_half_dim=False, center_shift=None)[源代码]

Bases: torch.nn.Module

Sinusoidal Positional Embedding 1D or 2D (SPE/SPE2d).

This module is a modified from: https://github.com/pytorch/fairseq/blob/master/fairseq/modules/sinusoidal_positional_embedding.py # noqa

Based on the original SPE in single dimension, we implement a 2D sinusoidal positional encodding (SPE2d), as introduced in Positional Encoding as Spatial Inductive Bias in GANs, CVPR’2021.

参数
  • embedding_dim (int) – The number of dimensions for the positional encoding.

  • padding_idx (int | list[int]) – The index for the padding contents. The padding positions will obtain an encoding vector filling in zeros.

  • init_size (int, optional) – The initial size of the positional buffer. Defaults to 1024.

  • div_half_dim (bool, optional) – If true, the embedding will be divided by \(d/2\). Otherwise, it will be divided by \((d/2 -1)\). Defaults to False.

  • center_shift (int | None, optional) – Shift the center point to some index. Defaults to None.

static get_embedding(num_embeddings, embedding_dim, padding_idx=None, div_half_dim=False)

Build sinusoidal embeddings.

This matches the implementation in tensor2tensor, but differs slightly from the description in Section 3.5 of “Attention Is All You Need”.

forward(input, **kwargs)

Input is expected to be of size [bsz x seqlen].

Returned tensor is expected to be of size [bsz x seq_len x emb_dim]

make_positions(input, padding_idx)

Make position tensors.

参数
  • input (tensor) – Input tensor.

  • padding_idx (int | list[int]) – The index for the padding contents.

  • filling (The padding positions will obtain an encoding vector) –

  • zeros. (in) –

返回

Position tensors.

返回类型

tensor

make_grid2d(height, width, num_batches=1, center_shift=None)

Make 2-d grid mask.

参数
  • height (int) – Height of the grid.

  • width (int) – Width of the grid.

  • num_batches (int, optional) – The number of batch size. Defaults to 1.

  • center_shift (int | None, optional) – Shift the center point to some index. Defaults to None.

返回

2-d Grid mask.

返回类型

Tensor

make_grid2d_like(x, center_shift=None)

Input tensor with shape of (b, …, h, w) Return tensor with shape of (b, 2 x emb_dim, h, w)

Note that the positional embedding highly depends on the the function, make_positions.

Read the Docs v: latest
Versions
master
latest
stable
zyh-doc-notfound-extend
Downloads
pdf
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.