What are we talking about when we talk about VAE in StableDiffusion?

What are we talking about when we talk about VAE in StableDiffusion?

March 22nd, 2023
View OriginalTranslated by Google
If you get in touch with StableDiffusion a little deeper, you will encounter the concept of VAE, and if you don’t understand the model structure behind the potential diffusion model and explore in the community for a period of time, you will definitely be confused about VAE and a bunch of pt ckpt files related to VAE water. So this article will explain the specific use of VAE and VAE in stable diffusion.

VAE model introduction

The VAE (Variational Auto-Encoder) model has two parts, an encoder and a decoder, which are often used for AI image generation

The VAE model is included in the composition of the potential diffusion model (Latent Diffusion Models)

Here is a flow chart of stable diffusion model reasoning summarized by myself

The encoder (encoder) is used to convert the picture into a low-dimensional potential representation, and the converted potential representation will be used as the input of U- Net model.

Conversely, the decoder will convert the latent representation back into image form.

During the training of the latent diffusion model, the encoder is used to obtain latents of the image training set, and these latent representations are used in the forward diffusion process (each step adds more noise to the latent representation).

During inference generation, the denoised latents generated by the backdiffusion process are converted back to image format by the decoder part of VAE .

Therefore, we only need to use the decoder part of the VAE in the inference generation process of the latent diffusion model.

VAEs in WebUI

Those more popular pre-trained models generally have a built-in trained VAE model , and we can do normal inference generation without additional mounting (the effect of generating images after mounting will be slightly different), so VAE pt files act like HDR , adding a little bit of image color space and some custom models

But if some pre-trained model files do not have built-in VAE (or train their own VAE, they will usually tell you where to get their VAE in their model release notes). We have to find a VAE to mount it to convert the denoised latents generated by backdiffusion during inference back to the image format, otherwise the final generated output in the webui is the latent representation (latents) similar to color noise. At this time, the VAE pt file acts like a decompression software, decompressing an image that is friendly to the naked eye for us.

I made a picture to illustrate it intuitively.

VAE model file acquisition

Popular VAE files used by the community:

Mount the model file using VAE

There are two ways to mount VAE model files in webUI

  • Rename it to <model prefix>.vae.pt and put it together with model
  • Put the VAE file into the VAE folder and select it in the settings

VAE files generally need to be uninstalled during model training

VAE learns by itself during the training process. With the training of the model, the actual performance of different versions of the model may be different. If necessary, you can prevent VAE from learning by itself by removing the VAE file

Reprinted from View Original


no dataCoffee time! Feel free to comment