HomeAI News
SD-XL 1.0: Vibrant Colors and Enhanced Performance
1

SD-XL 1.0: Vibrant Colors and Enhanced Performance

Hayo News
Hayo News
July 27th, 2023
View OriginalTranslated by Google

SD-XL 1.0-Refiner Model Card

Model

SDXL consists of a mixed-expert pipeline for latent diffusion: First, a (noisy) The latent information is then further processed using a refined model dedicated to the final denoising step. Note that the base model is available as a standalone module.

Alternatively, we can use the following two-stage pipeline: First, the underlying model is used to generate latent information of the desired output size. Then, in a second step, we use a specialized high-resolution model and use a technique called SDEdit (https://arxiv.org/abs/2108.01073, also known as "img2img") for the Latent information is processed using the same hints. This technique is slightly slower than the first because it requires more function evaluations.

Source code is available at https://github.com/Stability-AI/generative-models.

Model description

  • Developer: Stability AI
  • Model type: Diffusion-based text-to-image generative model
  • License: CreativeML Open RAIL++-M License
  • Model Description: This is a model that can generate and modify images based on text cues. It is a latent diffusion model using two fixed, pretrained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L).
  • Resources for more information: Check out our GitHub repository and the SDXL report on arXiv.

model source

For research purposes, we recommend using our generative models GitHub repository (https://github.com/Stability-AI/generative-models), which implements the most popular diffusion frameworks (both training and inference), and will New features like distillation are added over time. Clipdrop offers free SDXL inference.

  • Codebase: https://github.com/Stability-AI/generative-models
  • Demo: https://clipdrop.co/stable-diffusion

Evaluate

The chart above evaluates user preference for SDXL (with and without refinement) over SDXL 0.9 and Stable Diffusion 1.5 and 2.1. The SDXL base model performs significantly better than the previous variants, and the model combined with the refinement module achieves the best overall performance.

🧨 diffuser

The chart above evaluates user preference for SDXL (with and without refining) versus SDXL 0.9 and Stable Diffusion 1.5 and 2.1. The SDXL base model performs significantly better than the previous variant, and the model combined with the refinement module achieves the best overall performance.

Make sure to upgrade diffusers to version >= 0.18.0: pip install diffusers --upgrade Also, make sure to install transformers, safetensors, accelerate and invisible_watermark: pip install invisible_watermark transformers accelerate safetensors Then you can use the model as follows: from diffusers import DiffusionPipeline import torch pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1-0", torch_dtype=torch.float16, use_safetensors=True, variant="fp16") pipe.to("cuda") If you are using torch version < 2.0 pipe.enable_xformers_memory_efficient_attention() Prompt sentence: "An astronaut riding a green horse" images = pipe(prompt=prompt). images[0] When using torch version >= 2.0, you can increase the inference speed by 20-30% through torch.compile. Simply wrap unet with torch compile before running the pipeline: pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True) If your GPU is VRAM constrained, you can enable CPU offload by calling pipe.enable_model_cpu_offload instead of .to("cuda"): pipe.to("cuda") pipe.enable_model_cpu_offload()

Direct use <br /> of this model is for research purposes only. Possible research areas and assignments include:

  • Artwork is generated and used in design and other artistic processes.
  • Application in educational or creative tool.
  • Conduct research on generative models.
  • Safely deploy models that may generate harmful content.
  • Explore and understand the limitations and biases of generative models.

The following situations are excluded.

Use outside the scope <br />The model has not been trained to provide factual or realistic representations of people or events, so using the model to generate such content is beyond the capabilities of the model.

limitation factor

  • The model cannot achieve perfect photo-realistic effects The model cannot render clear and readable text
  • The model performed worse on harder tasks involving compositionality, such as rendering an image corresponding to "a red cube resting on a blue sphere"
  • Faces and characters may not be generated correctly
  • The autoencoding part of the model has a loss.

bias

  • While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social bias.

Reprinted from Hugging FaceView Original

Comments

no dataCoffee time! Feel free to comment