HomeAI News
The company behind Stable Diffusion open-sources a new model! Straight out of AI posters, it can be called pixel-level image generation
20

The company behind Stable Diffusion open-sources a new model! Straight out of AI posters, it can be called pixel-level image generation

Hayo News
Hayo News
April 29th, 2023
View OriginalTranslated by Google

StabilityAI is an open source artificial intelligence painting company, and it has a new plan recently!

They released a brand new open source model, DeepFloyd IF, which was on the GitHub trending list shortly after its release!

DeepFloyd IF can not only generate photo-like high-quality images, but also solve two big problems in the field of text and image generation:

Accurately draw text (such as neon signs); Accurate understanding and manipulation of spatial relationships (eg, a cat looking in a mirror and seeing the reflection of a lion).

img

For netizens, this is a great step. I wanted to let Midjourney v5 write on the neon sign before, but the AI can only draw a few strokes at will-the understanding of the mirror is even more limited.

With DeepFloyd IF, you can cleverly place the specified text anywhere, such as neon signs, street graffiti, clothing, hand-painted illustrations, etc., and appear in the most suitable position with the most suitable font, style, and typesetting.

This means that we can use AI to directly output product renderings or posters, which is more practical and opens up a new direction for video special effects.

At present, DeepFloyd IF has been open sourced under a non-commercial license agreement, and the team said that once enough user feedback is collected, it will move to a more relaxed agreement.

What is DeepFloyd IF

DeepFloyd IF is an AI painting model based on a diffusion model, which has two major differences from the previous Stable Diffusion. The part responsible for understanding the text has been replaced by Google T5-XXL from OpenAI's CLIP. Combined with the additional attention layer in the super-resolution module, more accurate text understanding can be obtained; the part responsible for generating images has been replaced by the latent diffusion model. The pixel-level diffusion model, that is, the diffusion process acts directly on the pixel.

The official also provides a set of visual comparisons between DeepFloyd IF and other AI painting models.

It can be seen that Google Parti and Nvidia eDiff-1, which use T5 for text understanding, can also draw text accurately, and the fact that AI can't write is the fault of CLIP.

However, NVIDIA eDiff-1 is not open source, and several models of Google do not even give a demo, so DeepFloyd IF has become a more practical choice. The DeepFloyd IF on the generated image is consistent with the previous model. After the language model understands the text, it first generates a small image with a resolution of 64x64, and then enlarges it through a diffusion model and a super-resolution model at different levels.

On this architecture, by shrinking the specified image back to 64x64 and then re-executing the diffusion with new prompt words, it is also possible to generate images and adjust the style, content and details without fine-tuning the model.

In addition, the advantage of DeepFloyd IF is that the IF-4.3B basic model has the most effective parameters in the U-Net part of the current diffusion model. In experiments, IF-4.3B achieves the best FID score and achieves SOTA (lower FID means higher image quality and better diversity).

The Story Behind DeepFloyd

DeepFloyd AI Research is an independent R&D team under StabilityAI. Its name comes from the Pink Floyd band. It calls itself a "R&D band". The team members mainly come from Eastern European backgrounds, with only four people.

In addition to open source code, they also provide an online demo of the DeepFloyd IF model on HuggingFace, but the Chinese support is not good enough.

Reference link:

Deepfloyd IF Online Demo: https://huggingface.co/spaces/DeepFloyd/IF

Github link: https://github.com/deep-floyd/IF

Comments

no dataCoffee time! Feel free to comment