About waifu-diffusion

The Waifu Diffusion 1.3 model is a Stable Diffusion model that has been finetuned from Stable Diffusion v1.4. I would like to personally thank everyone that had been involved with the development and release of Stable Diffusion, as all of this work for Waifu Diffusion would not have been possible without their original codebase and pre-existing model weights from which Waifu Diffusion was finetuned from.

The data used for finetuning Waifu Diffusion 1.3 was 680k text-image samples that had been downloaded through a booru site that provides high-quality tagging and original sources to the artworks themselves that are uploaded to the site. I also want to personally thank them as well, as without their hardwork the generative quality from this model would not have been feasible without going to financially extreme lengths to acquiring the data to use for training. The Booru in question would also like to remain anonymous due to the current climate regarding AI generated imagery.

Within the HuggingFace Waifu Diffusion 1.3 Repository are 4 final models:

  • Float 16 EMA Pruned: This is the smallest available form for the model at 2GB. This model is to be used for inference purposes only.
  • Float 32 EMA Pruned: The float32 weights are the second smallest available form of the model at 4GB. This is to be used for inference purposes only.
  • Float 32 Full Weights: The full weights contain the EMA weights which are not used during inference. These can be used for either training or inference.
  • Float 32 Full Weights + Optimizer Weights: The optimizer weights contain all of the optimizer states used during training. It is 14GB large and there is no quality difference between this model and the others as this model is to be used for training purposes only.

Various modifications to the data had been made since the Waifu Diffusion 1.2 model which included:

  • Removing underscores.
  • Removing parenthesis.
  • Separating each booru tag with a comma.
  • Randomizing tag order.

