Shortcuts

mmedit.models.editors.eg3d

Package Contents

Classes

GaussianCamera

Pre-defined camera class. Sample camera position in gaussian

UniformCamera

Pre-defined camera class. Sample camera position in uniform

DualDiscriminator

Dual Discriminator for EG3D. DualDiscriminator shares the same network

EG3D

Implementation of `Efficient Geometry-aware 3D Generative Adversarial

TriplaneGenerator

The generator for EG3D.

class mmedit.models.editors.eg3d.GaussianCamera(horizontal_mean: Optional[float] = None, vertical_mean: Optional[float] = None, horizontal_std: Optional[float] = 0, vertical_std: Optional[float] = 0, look_at: List = [0, 0, 0], fov: Optional[float] = None, focal: Optional[float] = None, up: VectorType = [0, 1, 0], radius: Optional[float] = 1)[source]

Bases: BaseCamera

Pre-defined camera class. Sample camera position in gaussian distribution.

Parameters
  • horizontal_mean (Optional[float]) – Mean of the horizontal range in radian. Defaults to None.

  • vertical_mean (Optional[float]) – Mean of the vertical range in radian. Defaults to None.

  • horizontal_std (Optional[float]) – Standard deviation of the horizontal range in radian. Defaults to None.

  • vertical_std (Optional[float]) – Standard deviation of the vertical range in radian. Defaults to None.

  • look_at (Optional[List, torch.Tensor]) – The look at position of the camera. Defaults to None.

  • up (Optional[List, torch.Tensor]) – The up direction of the world coordinate. Defaults to None.

  • radius (Optional[float]) – Radius of the sphere. Defaults to None.

class mmedit.models.editors.eg3d.UniformCamera(horizontal_mean: Optional[float] = None, vertical_mean: Optional[float] = None, horizontal_std: Optional[float] = 0, vertical_std: Optional[float] = 0, look_at: List = [0, 0, 0], fov: Optional[float] = None, focal: Optional[float] = None, up: VectorType = [0, 1, 0], radius: Optional[float] = 1)[source]

Bases: BaseCamera

Pre-defined camera class. Sample camera position in uniform distribution.

Parameters
  • horizontal_mean (Optional[float]) – Mean of the horizontal range in radian. Defaults to None.

  • vertical_mean (Optional[float]) – Mean of the vertical range in radian. Defaults to None.

  • horizontal_std (Optional[float]) – Standard deviation of the horizontal range in radian. Defaults to None.

  • vertical_std (Optional[float]) – Standard deviation of the vertical range in radian. Defaults to None.

  • look_at (Optional[List, torch.Tensor]) – The look at position of the camera. Defaults to None.

  • up (Optional[List, torch.Tensor]) – The up direction of the world coordinate. Defaults to None.

  • radius (Optional[float]) – Radius of the sphere. Defaults to None.

class mmedit.models.editors.eg3d.DualDiscriminator(img_channels: int = 3, use_dual_disc: bool = True, disc_c_noise: float = 0, *args, **kwargs)[source]

Bases: mmedit.models.editors.stylegan2.StyleGAN2Discriminator

Dual Discriminator for EG3D. DualDiscriminator shares the same network structure with StyleGAN2’s Discriminator. However, DualDiscriminator take volume rendered low-resolution image and super-resolutioned image at the same time. The LR image will be upsampled and concatenate with SR ones, and then feed to the discruminator together.

Parameters
  • img_channels (int) – The number of the image channels. Defaults to 3.

  • use_dual_disc (bool) – Whether use dual discriminator as EG3D. If True, the input channel of the first conv block will be set as 2 * img_channels. Defaults to True.

  • disc_c_noise (float) – The factor of noise’s standard deviation add to conditional input before passed to mapping network. Defaults to 0.

  • *args – Arguments for StyleGAN2Discriminator.

  • **kwargs

    Arguments for StyleGAN2Discriminator.

forward(img: torch.Tensor, img_raw: Optional[torch.Tensor] = None, cond: Optional[torch.Tensor] = None)[source]

Forward function.

Parameters
  • img (torch.Tensor) – Input high resoluation image tensor.

  • img_raw (torch.Tensor) – Input raw (low resolution) image tensor. Defaults to None.

  • cond (torch.Tensor) – The conditional input (camera-to-world matrix and intrinsics matrix). Defaults to None.

Returns

Predict score for the input image.

Return type

torch.Tensor

class mmedit.models.editors.eg3d.EG3D(generator: ModelType, discriminator: Optional[ModelType] = None, camera: Optional[ModelType] = None, data_preprocessor: Optional[Union[dict, mmengine.Config]] = None, generator_steps: int = 1, discriminator_steps: int = 1, noise_size: Optional[int] = None, ema_config: Optional[Dict] = None, loss_config: Optional[Dict] = None)[source]

Bases: mmedit.models.base_models.BaseConditionalGAN

Implementation of Efficient Geometry-aware 3D Generative Adversarial Networks

<https://openaccess.thecvf.com/content/CVPR2022/papers/Chan_Efficient_Geometry-Aware_3D_Generative_Adversarial_Networks_CVPR_2022_paper.pdf>_ (EG3D). # noqa

Detailed architecture can be found in TriplaneGenerator and DualDiscriminator

Parameters
  • generator (ModelType) – The config or model of the generator.

  • discriminator (Optional[ModelType]) – The config or model of the discriminator. Defaults to None.

  • camera (Optional[ModelType]) – The pre-defined camera to sample random camera position. If you want to generate images or videos via high-level API, you must set this argument. Defaults to None.

  • data_preprocessor (Optional[Union[dict, Config]]) – The pre-process config or GenDataPreprocessor.

  • generator_steps (int) – Number of times the generator was completely updated before the discriminator is updated. Defaults to 1.

  • discriminator_steps (int) – Number of times the discriminator was completely updated before the generator is updated. Defaults to 1.

  • noise_size (Optional[int]) – Size of the input noise vector. Default to 128.

  • num_classes (Optional[int]) – The number classes you would like to generate. Defaults to None.

  • ema_config (Optional[Dict]) – The config for generator’s exponential moving average setting. Defaults to None.

  • loss_config (Optional[Dict]) – The config for training losses. Defaults to None.

label_fn(label: Optional[torch.Tensor] = None, num_batches: int = 1) torch.Tensor[source]

Label sampling function for EG3D model.

Parameters

label (Optional[Tensor]) – Conditional for EG3D model. If not passed, self.camera will be used to sample random camera-to-world and intrinsics matrix. Defaults to None.

Returns

Conditional input for EG3D model.

Return type

torch.Tensor

data_sample_to_label(data_sample: mmedit.utils.typing.SampleList) Optional[torch.Tensor][source]

Get labels from input data_sample and pack to torch.Tensor. If no label is found in the passed data_sample, None would be returned.

Parameters

data_sample (List[EditDataSample]) – Input data samples.

Returns

Packed label tensor.

Return type

Optional[torch.Tensor]

pack_to_data_sample(output: Dict[str, torch.Tensor], index: int, data_sample: Optional[mmedit.structures.EditDataSample] = None) mmedit.structures.EditDataSample[source]

Pack output to data sample. If data_sample is not passed, a new EditDataSample will be instantiated. Otherwise, outputs will be added to the passed datasample.

Parameters
  • output (Dict[Tensor]) – Output of the model.

  • index (int) – The index to save.

  • data_sample (EditDataSample, optional) – Data sample to save outputs. Defaults to None.

Returns

Data sample with packed outputs.

Return type

EditDataSample

forward(inputs: mmedit.utils.typing.ForwardInputs, data_samples: Optional[list] = None, mode: Optional[str] = None) List[mmedit.structures.EditDataSample][source]

Sample images with the given inputs. If forward mode is ‘ema’ or ‘orig’, the image generated by corresponding generator will be returned. If forward mode is ‘ema/orig’, images generated by original generator and EMA generator will both be returned in a dict.

Parameters
  • inputs (ForwardInputs) – Dict containing the necessary information (e.g. noise, num_batches, mode) to generate image.

  • data_samples (Optional[list]) – Data samples collated by data_preprocessor. Defaults to None.

  • mode (Optional[str]) – mode is not used in BaseConditionalGAN. Defaults to None.

Returns

Generated images or image dict.

Return type

List[EditDataSample]

interpolation(num_images: int, num_batches: int = 4, mode: str = 'both', sample_model: str = 'orig', show_pbar: bool = True) List[dict][source]

Interpolation input and return a list of output results. We support three kinds of interpolation mode:

  • ‘camera’: First generate style code with random noise and forward

    camera. Then synthesis images with interpolated camera position and fixed style code.

  • ‘conditioning’: First generate style code with fixed noise and

    interpolated camera. Then synthesis images with style codes and forward camera.

  • ‘both’: Generate images with interpolated camera position.

Parameters
  • num_images (int) – The number of images want to generate.

  • num_batches (int, optional) – The number of batches to generate at one time. Defaults to 4.

  • mode (str, optional) – The interpolation mode. Supported choices are ‘both’, ‘camera’, and ‘conditioning’. Defaults to ‘both’.

  • sample_model (str, optional) – The model used to generate images, support ‘orig’ and ‘ema’. Defaults to ‘orig’.

  • show_pbar (bool, optional) – Whether display a progress bar during interpolation. Defaults to True.

Returns

The list of output dict of each frame.

Return type

List[dict]

class mmedit.models.editors.eg3d.TriplaneGenerator(out_size: int, noise_size: int = 512, style_channels: int = 512, cond_size: int = 25, cond_mapping_channels: Optional[int] = None, cond_scale: float = 1, zero_cond_input: bool = False, num_mlps: int = 8, triplane_size: int = 256, triplane_channels: int = 32, sr_in_size: int = 64, sr_in_channels: int = 32, sr_hidden_channels: int = 128, sr_out_channels: int = 64, sr_antialias: bool = True, sr_add_noise: bool = True, neural_rendering_resolution: int = 64, renderer_cfg: dict = dict(), rgb2bgr: bool = False, init_cfg: Optional[dict] = None)[source]

Bases: mmengine.model.BaseModule

The generator for EG3D.

EG3D generator contains three components:

  • A StyleGAN2 based backbone to generate a triplane feature

  • A neural renderer to sample and render low-resolution 2D feature and image from generated triplane feature

  • A super resolution module to upsample low-resolution image to high-resolition one

Parameters
  • out_size (int) – The resolution of the generated 2D image.

  • noise_size (int) – The size of the noise vector of the StyleGAN2 backbone. Defaults to 512.

  • style_channels (int) – The number of channels for style code. Defaults to 512.

  • cond_size (int) – The size of the conditional input. Defaults to 25 (first 16 elements are flattened camera-to-world matrix and the last 9 elements are flattened intrinsic matrix).

  • cond_mapping_channels (Optional[int]) – The channels of the conditional mapping layers. If not passed, will use the same value as style_channels. Defaults to None.

  • cond_scale (float) – The scale factor is multiple by the conditional input. Defaults to 1.

  • zero_cond_input (bool) – Whether use ‘zero tensor’ as the conditional input. Defaults to False.

  • num_mlps (int) – The number of MLP layers (mapping network) used in backbone. Defaults to 8.

  • triplane_size (int) – The size of generated triplane feature. Defaults to 256.

  • triplane_channels (int) – The number of channels for each plane of the triplane feature. Defaults to 32.

  • sr_in_size (int) – The input resolution of super resolution module. If the input feature not match with the passed sr_in_size, bilinear interpolation will be used to resize feature to target size. Defaults to 64.

  • sr_in_channels (int) – The number of the input channels of super resolution module. Defaults to 32.

  • sr_hidden_channels (int) – The number of the hidden channels of super resolution module. Defaults to 128.

  • sr_out_channels (int) – The number of the output channels of super resolution module. Defaults to 64.

  • sr_add_noise (bool) – Whether use noise injection to super resolution module. Defaults to False.

  • neural_rendering_resolution (int) – The resolution of the neural rendering output. Defaults to 64. Noted that in the training process, neural rendering resolution will be changed. Defaults to 64.

  • renderer_cfg (int) – The config to build EG3DRenderer. Defaults to ‘{}’.

  • rgb2bgr (bool) – Whether convert the RGB output to BGR. This is useful when pretrained model is trained on RGB dataset. Defaults to False.

  • init_cfg (Optional[dict]) – Initialization config. Defaults to None.

sample_ray(cond: torch.Tensor) Tuple[torch.Tensor][source]

Sample render points corresponding to the given conditional.

Parameters

cond (torch.Tensor) – Conditional inputs.

Returns

The original and direction vector of sampled rays.

Return type

Tuple[Tensor]

forward(noise: torch.Tensor, label: Optional[torch.Tensor] = None, truncation: Optional[float] = 1, num_truncation_layer: Optional[int] = None, input_is_latent: bool = False, plane: Optional[torch.Tensor] = None, add_noise: bool = True, randomize_noise: bool = True, render_kwargs: Optional[dict] = None) dict[source]

The forward function for EG3D generator.

Parameters
  • noise (Tensor) – The input noise vector.

  • label (Optional[Tensor]) – The conditional input. Defaults to None.

  • truncation (float, optional) – Truncation factor. Give value less than 1., the truncation trick will be adopted. Defaults to 1.

  • num_truncation_layer (int, optional) – Number of layers use truncated latent. Defaults to None.

  • input_is_latent (bool) – Whether the input latent. Defaults to False.

  • plane (Optional[Tensor]) – The pre-generated triplane feature. If passed, will use the passed plane to generate 2D image. Defaults to None.

  • add_noise (bool) – Whether apply noise injection to the triplane backbone. Defaults to True.

  • randomize_noise (bool, optional) – If False, images are sampled with the buffered noise tensor injected to the style conv block. Defaults to True.

  • render_kwargs (Optional[dict], optional) – The specific kwargs for rendering. Defaults to None.

Returns

A dict contains ‘fake_img’, ‘lr_img’, ‘depth’,

’ray_directions’ and ‘ray_origins’.

Return type

dict

Read the Docs v: latest
Versions
master
latest
stable
zyh-re-docs
zyh-doc-notfound-extend
zyh-api-rendering
Downloads
pdf
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.