Midjourney is in danger! Stable Diffusion-XL public beta is coming: can draw hands, can write, simple prompt can also have good results
By using the free and open source Stable Diffusion, Midjourney can be achieved too!
Since Midjourney released v5, there have been significant improvements in the realism of people and finger details in generated images, as well as improvements in accurate understanding of prompts, aesthetic diversity, and language understanding. In contrast, using Stable Diffusion requires writing a long prompt to generate a high-quality image, requiring multiple card draws.
Recently, Stability AI officially announced that Stable Diffusion XL, which is under development, has begun testing to the public and is now available for free trial on the Clipdrop platform.
Trial link: https://clipdrop.co/stable-diffusion
Emad Mostaque, founder and CEO of Stability AI, said that the model is still in the training stage and will be open sourced once the parameters are stable. It is understood that the performance of SD-XL in image details such as "handshake" will be more excellent, almost completely controllable.
Stable Diffusion XL is not the name of the final release, nor is it the v3 version, because the architecture of SD-XL is very similar to the model architecture of the SD-v2 series.
Minimalistic home gym with rubber flooring, wall-mounted TV, weight bench, medicine ball, dumbbells, yoga mats, high-tech equipment, high detail, organized and efficient.
minimalist home gym, rubber floor, wall mounted tv, weight bench, medicine ball, dumbbells, yoga mat, high tech equipment, high detail, organization and efficiency
The following are a few sample pictures officially released by SD-XL. It can be seen that the image quality is already very high:
However, some netizens believe that just because the quality is higher does not mean better, SD-XL has set too many rules in order to avoid "bad taste", and the customization space is getting smaller and smaller, which does not meet the taste of most people . Currently, version 1.5 of Stable Diffusion remains the most popular base model in the community.
Some netizens hope that the new version of SD can be compatible with SD 2.1 version embedding, Hypernetworks and Lora models, otherwise it will be very difficult to retrain from scratch.
It was also suggested that the SD-XL performed about as well as the models shared by users on the Civit website, and that the new models performed just as well.
SD-XL: The open source version of MJ
Regarding the details of the Stable Diffusion XL model, the official has not disclosed much. At present, we only know that it is a model similar to the v2 model architecture, but with a larger scale and parameters. SD-v2.1 has 900 million parameters, while SD-XL has approximately 2.3 billion parameters. Emad said that the official version may additionally be released with a smaller distilled version.
Improvements to SD-XL compared to previous versions include:
- Generate high-quality images with short, descriptive prompts
- Generated images are more in line with prompt
- The human body structure in the image is more reasonable
- Compared to v2.1 and v1.5, SD-XL produces images that are more aesthetically pleasing (to a lesser extent)
- Negative prompts are optional
- Generated portraits are more realistic
- Text in images is clearer
It is important to note that SD-XL may not be compatible with previous versions of the plugin.
legible text
It should be noted that in the v1 series and v2.1 version of the Stable Diffusion model, it does not have the ability to generate readable text in pictures. While the text information generated by SD-XL isn't always accurate, it does provide a huge improvement.
Photo of a woman sitting in a restaurant holding a menu that says “Menu”
A woman sitting in a restaurant holding a menu that says "Menu"
Photo of a man holding a sign that says “Stable Diffusion”
A man holds up a sign that says "Stable Diffusion"
a young female holding a sign that says "Stable Diffusion", highlights in hair, sitting outside restaurant, brown eyes, wearing a dress, side light a young female holding a sign that says "Stable Diffusion", highlights in hair , sitting outside a restaurant, brown eyes, wearing a dress, sidelights
better human body
Stable Diffusion has always had many problems in generating human anatomy, such as a few more legs, missing arms and other common problems. It is usually necessary to use the inpaint function to further correct the image, or use ControlNet's Open Pose function to copy the pose of the human body from the reference image to solve these problems.
For example, when SD-v1.5 generates yoga images, distorted human bodies often appear.
Photo of a woman in yoga outfit, triangle pose, beach in evening, rim lighting
Although the images generated by SD-XL are not perfect, it has made clear progress in human pose.
more aesthetic
For example, house-themed photos generated with SD-XL are more symmetrical and visually appealing compared to previous versions.
The SD-XL also has a notable improvement in portrait photos.
photo shot of a woman
More prompt images
SD-XL has a better ability to understand prompts and can generate more accurate images.
For example, when using duotone as a prompt, SD-v1.5 can only generate black and white images, while SD-XL can generate duotone images with multiple colors.
Compared with the v1 version, SD-XL's ability to understand prompts has also been greatly improved.
duotone portrait of a woman
Two-tone portrait of a woman
Since SD-XL is in the same category as the v2 series of models, the text model has a larger size and can understand cue words better than the v1 model.
For example, in the example below, the v1.5 model is unable to understand two subjects in the image (a robot and a human), but the SD-XL model can generate normal images (although the robot may not be "big" enough).
big robot friend sitting next to a human, ghost in the shell style, anime wallpaper
a young man, highlights in hair, brown eyes, in white shirt and blue jeans on a beach with a volcano in background , with a volcano in the background
Art style
In terms of art style, SD-XL has not improved significantly, and has its own strengths from previous versions.
For example, the two different versions of the model below generate Edward Hopper-style images from different angles.
New York city by Edward Hopper
New York by Edward Hopper
SD-v1.5 produces images that are more accurate in Leonid Afremov's style, while SD-XL lacks those distinctive and colorful brush strokes.
New York city by Leonid Afremov
New York by Leonid Afemov
In the style of William-Adolphe Bouguereau, both SD-v1.5 and SD-XL can produce similar content, but SD-XL is closer to the classic academic paintings created by Bouguereau, and contains more facial details.
Portrait of beautiful woman by William-Adolphe Bouguereau
Portrait of a Beauty by William-Adolphe Bouguereau
style change problem
After adding some irrelevant keywords, the style of the model may suddenly change. For example, to generate a photo-style image:
a young man, highlights in hair, brown eyes, in white shirt and blue jean on a beach with a volcano in background
A young man with brightly dyed hair and brown eyes in a white shirt and blue jeans stands on the beach with a volcano in the background
After adding a yellow scarf, the image style becomes cartoon style.
a young man, highlights in hair, brown eyes, wearing a yellow scarf, in white shirt and blue jean on a beach with a volcano in background
A young man with brightly dyed hair, brown eyes, in a yellow scarf, in a white shirt and blue jeans, stands on a beach with a volcano in the background
The source of this kind of problem may be a preview issue, which may be resolved after the official release, only time will tell.
References: