Shortcuts

mmedit.models.editors.stable_diffusion.vae

Module Contents

Classes

Downsample2D

A downsampling layer with an optional convolution.

Upsample2D

An upsampling layer with an optional convolution.

ResnetBlock2D

resnet block support down sample and up sample.

AttentionBlock

An attention block that allows spatial positions to attend to each

UNetMidBlock2D

middle block in unet.

DownEncoderBlock2D

Down encoder block in vae.

Encoder

construct encoder in vae.

UpDecoderBlock2D

construct up decoder block.

Decoder

construct decoder in vae.

DiagonalGaussianDistribution

Calculate diagonal gaussian distribution.

AutoencoderKL

Variational Autoencoder (VAE) model with KL loss

class mmedit.models.editors.stable_diffusion.vae.Downsample2D(channels, use_conv=False, out_channels=None, padding=1, name='conv')[源代码]

Bases: torch.nn.Module

A downsampling layer with an optional convolution.

参数
  • channels (int) – channels in the inputs and outputs.

  • use_conv (bool) – a bool determining if a convolution is applied.

  • out_channels (int) – output channels

  • padding (int) – padding num

forward(hidden_states)[源代码]

forward hidden states.

class mmedit.models.editors.stable_diffusion.vae.Upsample2D(channels, use_conv=False, use_conv_transpose=False, out_channels=None, name='conv')[源代码]

Bases: torch.nn.Module

An upsampling layer with an optional convolution.

参数
  • channels (int) – channels in the inputs and outputs.

  • use_conv (bool) – a bool determining if a convolution is applied.

  • use_conv_transpose (bool) – whether to use conv transpose.

  • out_channels (int) – output channels.

forward(hidden_states, output_size=None)[源代码]

forward with hidden states.

class mmedit.models.editors.stable_diffusion.vae.ResnetBlock2D(in_channels, out_channels=None, conv_shortcut=False, dropout=0.0, temb_channels=512, groups=32, groups_out=None, pre_norm=True, eps=1e-06, non_linearity='silu', time_embedding_norm='default', kernel=None, output_scale_factor=1.0, use_in_shortcut=None, up=False, down=False)[源代码]

Bases: torch.nn.Module

resnet block support down sample and up sample.

参数
  • in_channels (int) – input channels.

  • out_channels (int) – output channels.

  • conv_shortcut (bool) – whether to use conv shortcut.

  • dropout (float) – dropout rate.

  • temb_channels (int) – time embedding channels.

  • groups (int) – conv groups.

  • groups_out (int) – conv out groups.

  • pre_norm (bool) – whether to norm before conv. Todo: remove.

  • eps (float) – eps for groupnorm.

  • non_linearity (str) – non linearity type.

  • time_embedding_norm (str) – time embedding norm type.

  • output_scale_factor (float) – factor to scale input and output.

  • use_in_shortcut (bool) – whether to use conv in shortcut.

  • up (bool) – whether to upsample.

  • down (bool) – whether to downsample.

forward(input_tensor, temb)[源代码]

forward with hidden states and time embeddings.

class mmedit.models.editors.stable_diffusion.vae.AttentionBlock(channels: int, num_head_channels: Optional[int] = None, norm_num_groups: int = 32, rescale_output_factor: float = 1.0, eps: float = 1e-05)[源代码]

Bases: torch.nn.Module

An attention block that allows spatial positions to attend to each other. Originally ported from here, but adapted to the N-d case. https://github.com/hojonathanho/diffusion/blob/ 1e0dceb3b3495bbe19116a5e1b3596cd0706c543/diffusion_tf/models/unet.py#L66. Uses three q, k, v linear layers to compute attention.

参数
  • channels (int) – The number of channels in the input and output.

  • num_head_channels (int, optional) – The number of channels in each head. If None, then num_heads = 1.

  • norm_num_groups (int, optional, defaults to 32) – The number of groups to use for group norm.

  • rescale_output_factor (float, optional, defaults to 1.0) – The factor to rescale the output by.

  • eps (float, optional, defaults to 1e-5) – The epsilon value to use for group norm.

transpose_for_scores(projection: torch.Tensor) torch.Tensor[源代码]

transpose projection.

forward(hidden_states)[源代码]

forward hidden states.

class mmedit.models.editors.stable_diffusion.vae.UNetMidBlock2D(in_channels: int, temb_channels: int, dropout: float = 0.0, num_layers: int = 1, resnet_eps: float = 1e-06, resnet_time_scale_shift: str = 'default', resnet_act_fn: str = 'silu', resnet_groups: int = 32, resnet_pre_norm: bool = True, attn_num_head_channels=1, attention_type='default', output_scale_factor=1.0)[源代码]

Bases: torch.nn.Module

middle block in unet.

参数
  • in_channels (int) – input channels.

  • temb_channels (int) – time embedding channels.

  • dropout (float) – dropout rate, defaults to 0.0.

  • num_layers (int) – layer num.

  • resnet_eps (float) – resnet eps, defaults to 1e-6.

  • resnet_time_scale_shift (str) – time scale shift, defaults to ‘default’.

  • resnet_act_fn (str) – act function in resnet, defaults to ‘silu’.

  • resnet_groups (int) – conv groups in resnet, defaults to 32.

  • resnet_pre_norm (bool) – pre norm in resnet, defaults to True.

  • attn_num_head_channels (int) – attention head channels, defaults to 1.

  • attention_type (str) – attention type ,defaults to ‘default’.

  • output_scale_factor (float) – output scale factor, defaults to 1.0.

forward(hidden_states, temb=None, encoder_states=None)[源代码]

forward with hidden states, time embedding and encoder states.

class mmedit.models.editors.stable_diffusion.vae.DownEncoderBlock2D(in_channels: int, out_channels: int, dropout: float = 0.0, num_layers: int = 1, resnet_eps: float = 1e-06, resnet_time_scale_shift: str = 'default', resnet_act_fn: str = 'silu', resnet_groups: int = 32, resnet_pre_norm: bool = True, output_scale_factor=1.0, add_downsample=True, downsample_padding=1)[源代码]

Bases: torch.nn.Module

Down encoder block in vae.

参数
  • in_channels (int) – input channels.

  • out_channels (int) – output channels.

  • dropout (float) – dropout rate, defaults to 0.0.

  • num_layers (int) – layer nums, defaults to 1.

  • resnet_eps (float) – resnet eps, defaults to 1e-6.

  • resnet_time_scale_shift (str) – time scale shift in resnet, defaults to ‘default’.

  • resnet_act_fn (str) – act function in resnet, defaults to ‘silu’.

  • resnet_groups (int) – group num in resnet, defaults to 32.

  • resnet_pre_norm (bool) – whether to pre norm in resnet, defaults to True.

  • output_scale_factor (float) – output scale factor, defaults to 1.0.

  • add_downsample (bool) – whether to add downsample, defaults to True,

  • downsample_padding (int) – downsample padding num, defaults to 1.

forward(hidden_states)[源代码]

forward with hidden states.

class mmedit.models.editors.stable_diffusion.vae.Encoder(in_channels=3, out_channels=3, down_block_types=('DownEncoderBlock2D',), block_out_channels=(64,), layers_per_block=2, norm_num_groups=32, act_fn='silu', double_z=True)[源代码]

Bases: torch.nn.Module

construct encoder in vae.

forward(x)[源代码]

encoder forward.

class mmedit.models.editors.stable_diffusion.vae.UpDecoderBlock2D(in_channels: int, out_channels: int, dropout: float = 0.0, num_layers: int = 1, resnet_eps: float = 1e-06, resnet_time_scale_shift: str = 'default', resnet_act_fn: str = 'swish', resnet_groups: int = 32, resnet_pre_norm: bool = True, output_scale_factor=1.0, add_upsample=True)[源代码]

Bases: torch.nn.Module

construct up decoder block.

forward(hidden_states)[源代码]

forward hidden states.

class mmedit.models.editors.stable_diffusion.vae.Decoder(in_channels=3, out_channels=3, up_block_types=('UpDecoderBlock2D',), block_out_channels=(64,), layers_per_block=2, norm_num_groups=32, act_fn='silu')[源代码]

Bases: torch.nn.Module

construct decoder in vae.

forward(z)[源代码]

decoder forward.

class mmedit.models.editors.stable_diffusion.vae.DiagonalGaussianDistribution(parameters, deterministic=False)[源代码]

Bases: object

Calculate diagonal gaussian distribution.

sample(generator: Optional[torch.Generator] = None) torch.FloatTensor[源代码]

sample function.

kl(other=None)[源代码]

calculate kl divergence.

nll(sample, dims=[1, 2, 3])[源代码]

calculate negative log likelihood.

mode()[源代码]

return self.mean.

class mmedit.models.editors.stable_diffusion.vae.AutoencoderKL(in_channels: int = 3, out_channels: int = 3, down_block_types: Tuple[str] = ('DownEncoderBlock2D',), up_block_types: Tuple[str] = ('UpDecoderBlock2D',), block_out_channels: Tuple[int] = (64,), layers_per_block: int = 1, act_fn: str = 'silu', latent_channels: int = 4, norm_num_groups: int = 32, sample_size: int = 32)[源代码]

Bases: torch.nn.Module

Variational Autoencoder (VAE) model with KL loss from the paper Auto-Encoding Variational Bayes by Diederik P. Kingma and Max Welling.

参数
  • in_channels (int, optional, defaults to 3) – Number of channels in the input image.

  • out_channels (int, optional, defaults to 3) – Number of channels in the output.

  • (Tuple[str] (up_block_types) – obj:(“DownEncoderBlock2D”,)): Tuple of downsample block types.

  • optional – obj:(“DownEncoderBlock2D”,)): Tuple of downsample block types.

  • to (defaults) – obj:(“DownEncoderBlock2D”,)): Tuple of downsample block types.

  • (Tuple[str] – obj:(“UpDecoderBlock2D”,)): Tuple of upsample block types.

  • optional – obj:(“UpDecoderBlock2D”,)): Tuple of upsample block types.

  • to – obj:(“UpDecoderBlock2D”,)): Tuple of upsample block types.

  • (Tuple[int] (block_out_channels) – obj:(64,)): Tuple of block output channels.

  • optional – obj:(64,)): Tuple of block output channels.

  • to – obj:(64,)): Tuple of block output channels.

  • act_fn (str, optional, defaults to “silu”) – The activation function to use.

  • latent_channels (int, optional, defaults to 4) – Number of channels in the latent space.

  • sample_size (int, optional, defaults to 32) – sample size is now not supported.

encode(x: torch.FloatTensor, return_dict: bool = True) addict.Dict[源代码]

encode input.

decode(z: torch.FloatTensor, return_dict: bool = True) Union[addict.Dict, torch.FloatTensor][源代码]

decode z.

forward(sample: torch.FloatTensor, sample_posterior: bool = False, return_dict: bool = True, generator: Optional[torch.Generator] = None) Union[addict.Dict, torch.FloatTensor][源代码]
参数
  • sample (torch.FloatTensor) – Input sample.

  • sample_posterior (bool) – Whether to sample from the posterior. defaults to False.

  • return_dict (bool, optional, defaults to True) – Whether or not to return a [Dict] instead of a plain tuple.

返回

decode results.

返回类型

Dict(sample=dec)

Read the Docs v: latest
Versions
master
latest
stable
zyh-doc-notfound-extend
Downloads
pdf
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.